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Introduction 

Prostate  cancer  accounts  for  one -third  of  noncutaneous  cancers  diagnosed  in  US  men,1  is  a 
leading  cause  of  cancer-related  death  and  is,  appropriately,  the  subject  of  heightened  public 
awareness  and  widespread  screening.  If  prostate-specific  antigen  (PSA)2  or  digital  rectal  screens 
are  abnormal,  a  biopsy  is  considered  to  detect  or  rule  out  cancer.  Pathologic  status  of  biopsied 
tissue  forms  the  definitive  diagnosis  for  prostate  cancer  and  constitutes  an  important  cornerstone 
of  therapy  and  prognosis.4  There  is,  hence,  a  need  to  add  useful  information  to  diagnoses  and  to 
introduce  new  technologies  that  allow  efficient  analyses  of  cancer  to  focus  limited  healthcare 
resources.  For  the  reasons  underlined  above,  there  is  an  urgent  need  for  high-throughput, 
automated  and  objective  pathology  tools.  Our  general  hypothesis  is  that  these  requirements  are 
satisfied  through  innovative  spectroscopic  imaging  approaches  that  are  compatible  with,  and  add 
substantially  to,  current  pathology  practice.  Hence,  the  overall  aim  of  this  project  is  to 
demonstrate  the  utility  of  novel  Fourier  transform  infrared  (FTIR)  spectroscopy -based, 
computer-aided  diagnoses  for  prostate  cancer  and  develop  the  required  microscopy  and  software 
tools  to  enable  its  application. 

FTIR  spectroscopic  imaging  is  a  new  technique  that  combines  the  spatial  specificity  of  optical 
microscopy  and  the  biochemical  content  of  spectroscopy.5  As  opposed  to  thermal  infrared 
imaging,  FTIR  imaging  measures  the  absorption  properties  of  tissue  through  a  spectrum 
consisting  of  (typically)  1024  to  2048  wavelength  elements  per  pixel.6  Since  mid-IR  (2-12  pm 
wavelength)  spectra  reflect  the  molecular  composition  of  the  tissue,  image  contrast  arises  from 
differences  in  endogenous  chemical  species.  As  opposed  to  visible  microscopy  of  stained  tissue 
that  requires  a  human  eye  to  detect  changes,  numerical  computation  is  required  to  extract 
information  from  IR  spectra  of  unstained  tissue.  Extracted  infonnation,  based  on  a  computer 
algorithm,  is  inherently  objective  and  automated.  Recent  work  has  demonstrated  that  these 
determinations  are  also  accurate  and  reproducible  in  large  patient  populations.7  Hence,  we 
focused,  in  the  first  year  of  this  project,  on  demonstrating  that  the  laboratory  results  could  be 
optimized  using  novel  approaches  to  fast  imaging.  This  is  a  critical  step,  since  we  propose  next 
to  analyze  375  radical  prostatectomy  samples.  We  have  been  able  to  optimize  data  acquisition 
parameters  and  develop  a  novel  algorithm  for  processing  data  that  enables  almost  50-fold  faster 
imaging.  Briefly,  the  idea  behind  the  process  is  illustrated  in  Fig  1.  In  this  performance  period, 
we  sought  to  use  acquired  data  to  establish  the  use  of  IR  imaging  for  validating  cancer  diagnosis 
(task  2),  develop  a  calibration  and  prediction  model  for  grading  and  perform  extensive  validation 
(task  2).  Finally,  we  sought  to  develop  a  mathematical  framework  to  relate  disparate  pieces  of 
information  to  outcome  (task  3). 
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Figure  1.  (A)  Conventional  imaging  in  pathology  requires  dyes  and  a  human  to  recognize 
cells.  In  chemical  imaging  data  cubes  (B),  both  a  spectrum  at  any  pixel  (C)  and  the  spatial 
distribution  of  any  spectral  feature  can  be  seen.  e.g.  in  (D)  nucleic  acids  (left,  at  -1080  cm' 
'),  and  collagen  specific  (right,  at  -  1245  cm  1  )  Computational  tools  can  then  convert 
chemical  imaging  data  to  knowledge  used  in  pathology  (E). 


Body 

Specific  activities  and  tasks  as  per  statement  of  work  during  this  performance  period  are 
described  below.  Details  of  performance  for  the  past  years  periods  are  given  in  the  past  annual 
reports  which  is  attached  for  quick  reference  of  the  reviewers.  : 

Task  1.  Perform  infrared  spectroscopic  imaging  on  prostate  biopsy  specimens 
Goal:  Data  will  be  acquired  from  samples  identified  in  Task  2,  sub-task  a.  4  cmA(-l)  spectral 
resolution  data,  imaging  ~6  micrometer  of  sample  per  pixel  will  be  acquired  with  a  signal  to 
noise  ratio  of  greater  than  1000:1.  At  least  375  samples  will  be  imaged  to  provide  as  estimated 
40  million  spectra.  Data  will  continuously  be  available  for  analysis  in  this  period.  (Months  8-18) 

Activities:  Activities:  A  focal  plane  array  (FPA)  detector  was  interfaced  to  an  infrared 
interferometer  and  microscope  to  record  high-throughput  spectroscopic  imaging  data.  A  rapid¬ 
scanning  FTIR  imaging  system  that  can  image  more  than  16,000  spectra  per  second  was 
available.  The  system,  however,  provided  low  signal  to  noise  ratio  (SNR)  data.  In  increasing  the 
SNR  of  data  acquired,  there  are  typically  hardware  or  experimental  approaches.  It  is 
prohibitively  expensive  to  procure  new  hardware.  Hence,  typically,  the  approach  has  been  to 
increase  SNR  by  averaging  successively  acquired  images.  The  benefits  in  SNR  are  sfn,  where  n 
is  the  numbers  of  averaged  spectral  data  cubes.  Hence,  we  focused  next  on  developing  post¬ 
processing  methods,  as  detailed  next. 

Goal:  Develop  a  route  to  mathematically  transform  data  to  eliminate  noise  and  yield  high  quality 
data.  A  custom  algorithm  will  be  developed  in  which  the  covariance  matrix  is  employed  to  first 
perform  a  factor  analysis  equivalent  operation  followed  by  image  separation  from  noise  and  re¬ 
transformation.  Software  to  automatically  correct  data  will  be  available.  (Months  2-6) 

Activities:  The  methodology  was  developed  and  is  demonstrated  to  show  a  50-fold  improvement 
in  SNR.  Results  are  reported  in  publication  to  Analyst  and  were  presented  at  2  conferences. 


Our  approach  was  the  following:  The  first  and  simplest  approach  to  higher  fidelity  imaging 
required  co-adding  a  large  number  of  array  detector  snapshots  of  the  same  scene,  resulted  in  long 

o 

dwell  times  of  the  mirror  at  every  optical  retardation  .  We  operated  the  interferometer  in  step- 
scan  mode  and  wrote  custom  software  to  analyze  the  data.  The  advantages  of  this  frame  co¬ 
addition  process  were  limited  due  to  the  noise  characteristics  of  the  detector.  Hence,  an  optimal 
combination  of  frame  co-addition  and  repeated  scanning  was  implemented,  as  previously 
proposed9.  Though  these  methods  make  the  best  use  of  the  available  hardware,  they 
unfortunately,  require  large  increases  in  data  acquisition  time  as  the  SNR  reduction  scales  less 
than  linearly  with  the  acquisition  time.  In  order  to  obtain  high  SNR  data  using  acquisition-side 
approaches,  the  trade-off  with  respect  to  time  is  unavoidable.  Such  a  trade-off  limits  the  possible 
applications  of  FT-IR  imaging  as  a  routine  microscopic  analysis  tool  in  prostate  cancer. 

For  a  finite  data  acquisition  time,  other  schemes  to  extract  low  noise  information  are  available10 
but  these  methods  neglect  the  image  as  a  whole  and  result  in  loss  of  image  fidelity.  While  we 
implemented  these  schemes  here,  it  was  clear  that  structural  fidelity  of  the  tissue  image  was 
being  affected.  Hence,  we  turned  our  attention  to  another  alternative  to  hardware  improvement  or 
co-addition  schemes  for  high  fidelity  imaging.  This  approach  is  the  use  of  mathematical  noise 
reduction  techniques.  For  example,  a  procedure  based  on  the  Minimum  Noise  Fraction  (MNF) 
transform  was  adopted  from  the  satellite  and  airborne  imaging  community11.  With  rapid 
development  of  powerful  computers  and  increased  storage  capacities,  using  computation  to 
enhance  instrument  perfonnance  is  becoming  an  attractive  option.  Using  chemometric  methods 
to  enhance  acquired  FT-IR  imaging  data  has  been  a  relatively  recent  development.  A  convenient 
approach  is  to  use  an  Eigenvalue  decomposition  of  the  data  using  a  forward  transfonn,  e.g.  PCA. 
After  selecting  eigenimages  with  sufficient  SNR,  the  selected  data  are  inverse  transformed  to 
yield  the  entire  dataset  with  lower  noise  content.  This  approach  was  used  “  to  examine  phase 
compositions  by  enhancing  contrast  between  different  regions.  PCA  reorders  data  in  decreasing 
order  of  variance. 

A  similar  technique  called  MNF  transform  was  proposed13  to  re-order  image  data  in  decreasing 
order  of  SNR.  A  modified  version14  of  this  transform  has  been  shown  to  improve  image  fidelity 
and  achieve  better  noise  reduction  than  PCA,  for  example. 

Mathematical  transfonn  techniques  for  noise  reduction  generally  utilize  the  fact  that  noise  in 
uncorrelated  whereas  spectra  (signals)  have  a  fairly  high  degree  of  correlation.  In  the  transform 
domain,  the  signal  is  primarily  restricted  to  a  few  factors  where  as  the  noise  is  spread  across  all 
factors.  We  use  the  term  'factors'  to  refer  to  images  of  eigenvalues  in  the  transform  domain. 
Noise  reduction  can  be  achieved  by  retaining  factors  corresponding  to  high  signal  content, 
removing  factors  predominantly  corresponding  to  noise  and  computing  the  inverse  transfonn. 
Identifying  factors  corresponding  to  high  signal  content  is  an  important  step  in  the  noise 
reduction  process. 

The  identification  of  factors  to  include  is  invariably  a  manual  process  and  is  the  key  impediment 
to  routine  application  of  these  methods  for  noise  reduction.  First,  the  manual  selection  will  vary 
from  practitioner  to  practitioner,  leading  to  variance  in  the  results  obtained  from  the  same  data 
set.  The  scientific  conclusions  or  confidence  in  results,  hence,  may  vary  in  an  unpredictable 
manner.  Second,  the  need  to  examine  every  eigenvalue  image  (or,  at  least,  a  large  set  of  images) 


is  time-consuming.  The  decision  to  exclude  or  include  images  with  questionable  content  is 
especially  difficult  and  requires  significant  time  as  some  quantitative  guidance  is  often  used.  For 
example,  we  have  used  comparisons  of  values  from  sample  and  sample-less  regions.  These  two 
factors  are  a  key  barrier  in  the  use  of  these  post-processing  techniques  for  enhancing  IR  imaging 
data. 

There  are  many  dimension  reduction  and  noise  reduction  schemes  proposed15’16.  Many  of  these 
methods 15,1 7,1 8  choose  all  factors  before  a  certain  cut  off  (k)  determined  based  on  predefined 
criteria.  However,  the  assumption  that  all  of  the  first  k  factors  are  important  is  questionable.  The 
MNF  approach  was  specifically  developed  to  overcome  the  observation  that  the  first  k  factors  in 
PCA  were  not  always  optimal.  Other  methods16'19  can  be  computationally  expensive  or  do  not 
utilize  some  of  the  features  of  the  data  in  factors. 

A  general  criticism  of  these  methods  is  that  they  do  not  explicitly  account  for  the  spatial  and 
spectral  information  in  the  data.  For  example,  PCA  separates  features  in  the  spatial  domain  by 
accounting  for  variance  in  the  scene.  The  variance  may  arise  from  the  data,  sensor  or  may  be  an 
artifact.  Similarly,  the  signal  in  the  re-ordering  of  MNF  factors  is  assumed  to  be  features  in  the 
image  but  could  come  from  factors  other  than  the  sample  of  interest.  For  example,  Figure  1 
shows  the  4th,  8th,  12th  and  19th  MNF  factor  for  FT-IR  data  from  a  breast  tissue  sample.  The  4th 
MNF  factor  shows  interesting  tissue  structural  features.  Although  the  8th  factor  has  higher  SNR 
compared  to  the  12th  or  19th  factor,  the  12th  and  19th  factors  contain  relatively  more  features 
of  interest.  We  would  include  the  12th  and  19th  factors  but  not  the  8th  in  a  noise  reduction  scheme 
involving  MNF  transform.  The  8th  factor  likely  arises  from  illumination  or  water  vapor 
differences  and  not  from  the  sample  itself. 


Figure  2.  (A)  4th  MNF  Factor  (Tissue  structural  features  visible)  (B)  8th  MNF  factor  (C)  12th 
MNF  factor  (D)  19th  MNF  factor.  The  8th  factor  has  less  structural  features  compared  to 
12th  or  19th  factor. 


Hence,  we  proposed  a  factor  selection  algorithm  that  selects  factors  based  on  structural  features 
in  a  quantitative  manner.  Although  we  illustrate  the  utility  of  the  proposed  algorithm  for  tissue 


FT-IR  data,  the  technique  is  more  general  and  can  be  applied  to  any  other  data  in  which 
structures  in  images  are  well  described  by  edges.  We  could  also  use  the  proposed  factor  selection 
algorithm  with  other  transform  techniques  like  PC  A  for  example.  A  generalization  of  the  MNF 
transform  has  been  proposed  by'  .  However,  we  did  not  observe  the  kind  of  distortion  described 
in  20  in  our  data  and  therefore  did  not  find  the  need  to  use  the  generalized  MNF.  We 
demonstrate  the  efficacy  of  this  automated  SNR  enhancement  by  applying  the  process  to  breast 
tissue  data.  The  effects  of  SNR  are  quantitatively  measured  by  the  accuracy  of  classifying  tissue. 

Over  5  million  spectra  have  been  acquired  from  approx.  475  samples  using  4  cm'1  resolution 
over  the  7200-720  cm'1  range  and  6.25  micron  on  a  side  per  pixel.  Data  handling  and  analysis  is 
on-going.  The  data  were  acquired  using  a  tissue  microarray  with  no  restrictions  on  age  or  prior 
PSA  reading.  The  archiving  and  record  keeping  for  such  data  sets  became  a  challenge.  Hence, 
we  developed  data  handling  tools  to  both  maintain  a  database  of  properties  as  well  as  visualize 
the  data  in  a  microarray  format.  For  example,  one  acquired  data  set  is  shown  below  in  Figure  3. 
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Figure  3.  Approximately  475  viable  samples  for  further  analysis  acquired  by  FT-IR 
imaging  and  classified  as  per  optimized  protocols  developed  previously  in  this  project. 

A  second  set  of  460  samples  were  also  acquired  for  validation  studies.  This  large  scale  data 
acquisition  has  never  been  previously  reported  and  is  a  direct  result  of  the  optimizations 
accomplished  in  year  1  of  this  project.  Corresponding  to  each  sample  in  the  tissue  array  above, 
we  have  developed  a  database  to  store  information  for  the  patient,  including  age,  PSA  values  at 
the  time  of  diagnosis,  Gleason  grade  and  stage  on  diagnosis  as  well  as  outcome. 


As  per  previous  studies  in  year  1 ,  we  determined  that  there  was  a  need  to  acquire  data  of  a  signal 
to  noise  ratio  (SNR)  of  at  least  1000:1  (or,  30  dB).  One  outstanding  question  is  how  to  predict 
the  required  SNR  for  any  classification  task.  This  is  a  major  issue  in  which  no  useful  guidance 
was  available  in  the  literature.  In  observing  the  data  from  many  samples,  it  became  clear  that 
new  tools  were  needed  to  visualize  diversity  and  usefulness  of  particular  samples.  In  particular, 
one  key  element  of  the  protocol  depends  on  a  quality  check.  If  contaminations  exist  in  samples 
or  the  sample  does  not  belong  to  a  population  that  is  similar  to  the  one  that  was  used  to  construct 


a  calibration  of  the  data,  then  the  sample  will  clearly  lead  to  incorrect  results.  Such  a  sample 
must  be  flagged  during  quality  control  but  there  was  no  obvious  means  to  do  so.  Hence,  we 
developed  a  new  visualization  system  for  spectrum  wide  analysis  of  the  data. 


First,  we  recall  that  not  every  point  in  the  spectrum  is  actually  useful  in  calibration  or  prediction. 
The  data  are  reduced  to  a  potential  set  of  descriptors,  termed  metrics,  which  are  peak  height 
ratios,  areas,  positions  or  even  spatial  indices.  Only  a  few  of  these  metrics  are  useful  in 
calibration,  and  consequently,  in  predicting  histopathology.  Hence,  we  employ  the  visualization 
only  for  a  set  of  metrics.  A  view  of  the  developed  software  and  typical  plot  resulting  from  the 
analysis  is  shown  in  Figure  4. 
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Figure  4.  A  Representation  of  metric-patient  data  to  determine  quality  and  consistency  in 
large  scale  data  analysis.  Many  representations  are  possible,  including  the  one  shown  here. 
Here,  the  value  of  (pi-p2)/o  for  each  metric  is  represented,  where  pi  is  the  mean  of 
epithelium  pixels  for  one  patient  for  a  particular  metric  and  p2  is  the  mean  of  stroma  pixels 
for  one  patient  for  a  particular  metric  whereas  a  is  the  standard  deviation  of  the  entire 
metric.  Hence,  (pi-p2)/o  is  a  measure  of  classification  potential  in  separating  epithelium 
from  stroma.  Patient  no.  34  can  be  seen  to  have  outlier  values  that  must  be  investigated  in 
detail  so  as  not  to  become  a  confounding  variable. 


Task  2.  Analyze  spectroscopic  imaging  data  for  biochemical  markers  of  tumor  and  develop 

numerical  algorithms  for  grading  cancer 

Goal:  Develop  algorithm  for  malignancy  recognition.  Models  will  be  constructed  and  optimized 
using  Genetic  Algorithms  operating  on  identified  metrics.  Models  will  be  tested  and  validated 
using  ROC  curves  with  pathologist  marking  as  the  ground  truth.  A  protocol  for  segmenting 
benign  from  atypical  condition  will  be  available.  (Months  11-18)  Three  specific  aims  from  the 
statement  of  work  (SOW)  are: 

a.  Identify  samples  to  be  imaged  (Months  1-3)  by  examining  stained  slides 

b.  Obtain  unstained  samples  to  be  imaged  and  define  regions  for  calibration  and 
validation  (Months  4-7) 

c.  Perform  histologic  identification  on  prostate  samples  and  validate 

d.  Reduce  spectral  metrics  to  those  useful  in  identifying  atypia  (Months  8-12) 


e.  Develop  protocols  and  validate  distinction  between  benign-appearing  and  atypical 
tissue  (Months  12-18) 

f.  Develop  calibration  for  predicting  cancer  grade  (Months  18-22) 

g.  Develop  protocols  and  validate  Gleason  grading  of  tumor  (Months  18-27) 


Activities: 


Goal:  Data  acquisition  and  treatment  protocol  will  be  optimized  and  feedback  loop  implemented. 
Image  sets  will  be  acquired  at  low  averaging  and  extensive  averaging  conditions  to  verify 
performance  and  optimize  algorithm.  A  validated  protocol  for  collecting  data  will  be  available. 
(Months  5-7) 

Activities:  Data  were  acquired  and  experimental  conditions  were  optimized  to  help  determine 
the  operating  points  for  prostate  histology.  Briefly,  the  spectral  resolution  was  not  found  to  be 
important  unless  coarse  resolution  was  obtained.  SNR  was  found  to  be  crucial  and  a  plot  of  the 
SNR  versus  the  classification  accuracy  yielded  the  optimal  operating  point.  Results  are 
summarized  in  a  peer-reviewed  manuscript21  and  the  methodology  is  described  in  a  review 
paper.  Results  were  presented  at  three  different  meetings. 

A  single  button  operation  is  implemented  in  our  software  that  now  pre-processes  data  and  adjusts 
for  appropriate  SNR.  A  second  step  can  then  classify  the  resulting  data  into  histologically  correct 
classes. 

Goal:  Data  will  be  acquired  from  samples  identified  in  Task  2,  sub-task  a.  4  cmA(-l)  spectral 
resolution  data,  imaging  ~6  micrometer  of  sample  per  pixel  will  be  acquired  with  a  signal  to 
noise  ratio  of  greater  than  1000:1.  At  least  375  samples  will  be  imaged  to  provide  as  estimated 
40  million  spectra.  Data  will  continuously  be  available  for  analysis  in  this  period.  (Months  8-18) 
Activities:  Over  4  million  spectra  have  been  acquired  from  approx.  460  samples.  Data  handling 
and  analysis  is  on-going.  The  data  were  acquired  using  a  tissue  microarray  with  no  restrictions 
on  age  or  prior  PSA  reading. 


TASK  2E:  DEVELOP  PROTOCOLS  AND  VALIDATE  DISTINCTION  BETWEEN 
BENIGN-APPEARING  AND  ATYPICAL  TISSUE 

We  were  able  to  accomplish  task  2e  entirely  and  a  manuscript  has  been  submitted  (under 
review).  An  invention  disclosure  was  filed  with  the  office  and  technology  management,  who 
then  decided  to  file  a  preliminary  paten  on  the  work. 

We  develop  a  new  fully-automated  method  to  classify  cancer  versus  non-cancer  prostate  tissue 
samples.  The  classification  algorithm  uses  morphological  features  -  geometric  properties  of 
epithelial  cells/nuclei  and  lumens  -  that  are  quantified  based  on  H&E  stained  images  as  well  as 
FT-IR  images  of  the  samples.  By  restricting  the  features  used  to  geometric  measures,  we  sought 
to  mimic  the  pattern  recognition  process  employed  by  human  experts,  and  achieve  a  robust 
classification  procedure  that  can  produce  consistently  high  accuracy  across  independent  data 
sets.  We  systematically  evaluate  the  performance  of  the  new  method  through  cross-validation, 


and  examine  its  robustness  across  data  sets.  We  also  summarize  the  specific  morphological 
features  that  prove  to  be  most  informative  in  classification. 
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Figure  5.  IR  imaging  data  and  its  use  in  histologic  classification.  (Upper  row)  IR  imaging 
data  (b)  is  acquired  for  an  unstained  tissue  section  (a).  The  data  is  then  classified  into  cell 
types  and  a  classified  image  (c)  is  obtained.  The  colors  indicate  cell  types  in  a  histologic 
model  of  prostate  tissue.  This  method  is  robust  and  applied  to  hundreds  of  tissue  samples 
using  the  tissue  microarray  (TMA)  format.  (Lower  row)  H&E  (d)  and  IR  classified  (e) 
images  of  a  part  of  the  TMAs  used. 


Methods:  Several  new  methods  were  developed  to  accomplish  the  task. 

We  begin  with  a  description  of  the  computational  pipeline.  As  noted  above,  a  key  aspect  of  our 
approach  is  the  use  of  FT-IR  imaging  data  on  a  serial  section  that  is  H&E-stained  to  enhance  the 
segmentation  of  nuclei  and  lumens.  The  first  two  components  of  the  pipeline  (§1-2)  are  geared  to 
this  functionality,  while  the  next  three  components  (§3-5)  exploit  the  segmented  features 
obtained  from  image  data  to  classify  the  tissue  sample  (Figure  3). 
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Figure  6.  Overview  of  the  approach,  (a,  b)  FTIR  spectroscopic  imaging  data-based  cell- 
type  classification  (IR  classified  image),  is  overlaid  with  H&E  stained  image  (a),  leading  to 
segmentation  of  nuclei  and  lumens  in  a  tissue  sample  (b).  (c,d,e)  Features  are  extracted  and 
selected  (c),  and  used  by  the  classifier  (d)  to  predict  (e)  whether  the  sample  is  cancerous  or 
benign. 


1.  Image  Registration 

Given  two  images,  the  image  registration  problem  can  be  defined  as  finding  the  optimal  spatial 
and  intensity  transformation  of  one  image  to  the  other.  Here,  two  images  are  H&E  stained  and 
“IR  classified”  images  which  were  acquired  from  adjacent  tissue  samples.  The  IR  classified 
image  represents  the  FT-IR  imaging  data,  processed  as  indicated  in  Figure  2,  to  classify  each 
pixel  as  a  particular  cell  type.  Although  the  two  samples  were  physically  in  the  same  intact  tissue 
and  are  structurally  similar,  the  two  images  have  different  properties  (total  image  and  pixel  sizes, 
contrast  mechanisms  and  data  values).  Hence,  features  to  spatially  register  the  images  are  not 
trivial.  The  H&E  image  provides  detailed  morphological  information  that  could  ordinarily  be 
used  for  registration,  but  the  IR  image  lacks  such  information.  On  the  other  hand,  the  IR  image 
specifies  the  exact  areas  corresponding  to  each  cell  type,  but  the  difficulty  in  precisely  extracting 
such  regions  from  the  H&E  image  hinders  us  from  using  cell-type  information  for  registration. 
The  only  obvious  features  are  macroscopic  sample  shape  and  empty  space  (lumens)  inside  the 
samples.  To  utilize  these  two  features  and  to  avoid  problems  due  to  differences  in  the  two 


imaging  techniques,  both  images  are  first  converted  into  binary  images.  Due  to  the  binarization, 
the  intensity  transformation  is  not  necessary.  As  a  spatial  transformation,  we  use  an  affine 
transformation  (/)  where  a  coordinate  (xi,  yi )  is  transformed  to  the  (xj,  yi)  coordinate  after 
translations  (tx,  ty),  rotation  by  #,  and  scaling  by  factor  ,v. 
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Accordingly,  we  find  the  optimal  parameters  of  the  affine  transformation  that  minimizes  the 
absolute  intensity  difference  between  two  images  (Inference  and  I  target )•  In  other  words,  image 
registration  amounts  to  finding  the  optimal  parameter  values 
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reference 


'  f  (j target’* •  The  downhill  simplex  method  is  applied  to 


solve  the  above  equation.  An  example  of  this  registration  process  is  shown  in  Figure  4. 
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Figure  4.  Image  Registration.  H&E  stained  images  and  IR  classified  images  are  first 
converted  into  binary  images.  The  IR  classified  image  is  overlaid  with  the  H&E  stained 
image  by  affine  transformation,  with  the  optimal  matching  being  found  by  minimizing  the 
absolute  intensity  difference  between  two  images.  After  registration,  original  annotations 
(color  and/or  cell-type  information)  of  each  image  are  restored 


2.  Identification  of  epithelial  cells  and  their  morphologic  features 

While  a  number  of  factors  are  known  to  be  transformed  in  cancerous  tissues,  epithelial 
morphology  is  utilized  as  the  clinical  gold  standard.  Hence,  we  focus  here  on  cellular  and  nuclear 
morphology  of  epithelial  nuclei  and  lumens.  These  structures  are  different  in  normal  and 
cancerous  tissues,  but  are  not  widely  used  in  automated  analysis  due  to  a  few  reasons.  First,  as 
described  above,  simple  detection  of  epithelium  from  H&E  images  is  difficult.  Second,  detection 
of  epithelial  nuclei  may  be  confounded  by  a  stromal  response  that  is  not  unifonn  for  all  grades 
and  types  of  cancers.  We  focused  first  on  addressing  these  two  challenges  that  hinder 


automatically  parsing  morphologic  features  such  as  the  size  and  number  of  epithelial  nuclei  and 
lumens,  distance  from  nuclei  to  lumens,  geometry  of  the  nuclei  and  lumens,  and  others  (§3).  In 
order  to  use  these  properties,  the  first  step  is  to  detect  nuclei  and  lumens  correctly  and  we  sought 
to  develop  a  robust  strategy  for  the  same. 

2.1.  Lumen  Detection 

In  H&E  stained  images,  lumens  are  recognized  to  be  empty  white  spaces  surrounded  by 
epithelial  cells.  In  nonnal  tissues,  lumens  are  larger  in  diameter  and  can  have  a  variety  of  shapes. 
In  cancerous  tissues,  lumens  are  progressively  smaller  with  increasing  grade  and  generally  have 
less  distorted  elliptical  or  circular  shapes.  Our  strategy  to  detect  lumens  was  to  find  empty  areas 
that  are  located  next  to  the  areas  rich  in  epithelium.  White  spots  inside  the  sample  can  be  found 
from  the  H&E  image,  and  the  pixels  corresponding  to  epithelial  cells  can  be  mapped  on  the  H&E 
image  from  the  IR  classified  image  through  image  registration.  We  note  that  while  lumens  are 
ideally  completely  surrounded  by  epithelial  cells  (called  complete  lumens),  some  samples  have 
lumens  (called  incomplete  lumens)  that  violate  this  criterion  because  only  a  part  of  lumen  is 
present  in  the  sample.  To  identify  these  incomplete  lumens,  we  use  heuristic  criteria  based  on  the 
size,  shape,  presence  of  epithelial  cells  and  background  around  the  areas,  and  distance  from  the 
center  of  the  tissue.  (See  Supplementary  Materials  for  details.) 

2.2.  Nucleus  Detection  -  single  epithelial  cells 

Epithelial  nucleus  detection  by  automated  analysis  is  more  difficult  than  lumen  detection  due  to 
variability  in  staining  and  experimental  conditions  under  which  the  entire  set  of  H&E  images 
were  acquired.  Differences  between  normal  and  cancerous  tissues,  and  among  different  grades  of 
cancerous  tissues,  also  hamper  facile  detection.  To  handle  such  variations  and  make  the  contrast 
of  the  images  consistent,  we  perform  smoothing  and  adaptive  histogram  equalization  prior  to 
nuclei  identification.  Nuclei  are  relatively  dark  and  can  be  modeled  as  small  elliptical  areas  in  the 
stained  images.  This  geometrical  model  is  often  confounded  as  multiple  nuclei  can  be  so  close  as 
to  appear  like  one  large,  arbitrary-shaped  nucleus.  Also,  small  folds  or  edge  staining  around 
lumens  can  make  the  darker  shaded  regions  difficult  to  analyze.  Here,  we  exploit  the  information 
provided  by  the  IR  classified  image  to  limit  ourselves  to  epithelial  cells,  and  use  a  thresholding 
heuristic  on  a  color  space-transformed  image  to  identify  nuclei  with  high  accuracy.  Epithelial 
pixels  that  are  identified  on  the  H&E  images  using  the  IR  overlay  provide  pixels  of  dominated  by 
one  of  two  colors:  blue  or  pink,  which  arise  from  the  nuclear  and  cytoplasmic  component 
respectively.  For  nuclei  restricted  to  epithelial  cells  in  this  manner,  a  set  of  general  observations 
were  made  that  led  us  to  convert  the  stained  image  to  a  new  color  space  “RG-B”  (|R  +  G-B|). 
(R,  G,  and  B  represent  the  intensity  of  Red,  Green,  and  Blue  channels,  respectively.)  This 
transformation,  followed  by  suitable  thresholding,  was  able  to  successfully  characterize  the  areas 
where  nuclei  are  present.  The  threshold  values  are  adaptively  determined  for  Red  and  Green 
channels  due  to  the  variations  in  the  color  intensity.  (See  Supplementary  Materials  for  details.) 
Finally,  filling  holes  and  gaps  within  nuclei  by  a  morphological  closing  operation,  the 
segmentation  of  each  nucleus  is  accomplished  by  using  a  watershed  algorithm  followed  by 
elimination  of  false  detections.  The  size,  shape,  and  average  intensity  are  considered  to  identify 
and  remove  artifactual  nuclei.  Figure  5  details  the  nucleus  detection  procedure. 
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Figure  7.  Nucleus  Detection.  Smoothing  and  adaptive  histogram  equalization  are 
performed  to  alleviate  variability  in  H&E  stained  image  and  to  obtain  better  contrast.  “RG 
-  B”  conversion  followed  by  thresholding  characterizes  the  areas  where  nuclei  exist. 
Morphological  closing  operation  is  performed  to  fill  holes  and  gaps  within  nuclei,  and  a 
watershed  algorithm  segments  each  individual  nuclei.  The  segmented  nuclei  are 
constrained  by  their  shape,  size,  and  average  intensity  and  epithelial  cell  classification 
(green  pixels)  provided  by  the  overlaid  IR  image. 


3.  Feature  Extraction 

As  mentioned  above,  the  characteristics  of  nuclei  and  lumens  change  in  cancerous  tissues.  In  a 
nonnal  tissue,  epithelial  cells  are  located  mostly  in  thin  layers  around  lumens.  In  cancerous 
tissue,  these  cells  generally  grow  to  fill  lumens,  resulting  in  a  decrease  in  the  size  of  lumens,  with 
the  shape  of  lumens  becoming  more  elliptical  or  circular.  The  epithelial  association  with  a  lumen 
becomes  inconsistent  and  epithelial  foci  may  adjoin  lumens  or  may  also  exist  without  an 
apparent  lumen.  Epithelial  cells  invading  the  extra-cellular  matrix  also  result  in  a  deviation  from 
the  well-formed  lumen  structure;  this  is  well-recognized  as  a  hallmark  of  cancer.  Due  to  filling 
lumen  space  and  invasion  into  the  extra-cellular  space,  the  number  density  of  epithelial  cells 
increases  in  tissue.  The  size  of  individual  epithelial  cells  and  their  nuclei  also  tend  to  increase  as 
malignancy  of  a  tumor  increases.  Motivated  by  such  recognized  morphological  differences 
between  nonnal  and  cancerous  tissues,  we  chose  to  use  epithelial  nuclei  and  lumens  as  the  basis 
of  the  several  quantitative  features  that  our  classification  system  works  with.  (See  examples  of 
such  features  in  Figure  6.)  It  is  notable  that  these  observations  are  qualitative  in  actual  clinical 
practice  and  have  not  been  previously  quantified. 
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Figure  8.  Examples  Features.  Each  panel  shows  one  example  feature,  along  with  the 
distributions  of  the  feature’s  values  for  cancer  (red)  and  benign  (blue)  classes. 


3.1.  Epithelial  cell-related  features 

We  use  epithelial  cell  type  classification  from  IR  data  to  measure  epithelium-related  features. 
However,  individual  epithelial  cells  in  the  tissue  are  not  easily  delineated.  Therefore,  in  addition 
to  features  directly  describing  epithelial  cells,  we  also  quantify  properties  of  epithelial  nuclei, 
which  are  available  from  the  segmentation  described  in  §2.  The  quantities  we  measure  in 
defining  features  are:  (1)  size  of  epithelial  cells,  (2)  size  of  epithelial  nuclei,  (3)  number  of  nuclei 
in  the  sample,  (4)  distance  from  a  nucleus  to  the  closest  lumen,  (5)  distance  from  a  nucleus  to  the 
epithelial  cell  boundary,  (6)  number  of  “isolated”  nuclei  (nuclei  that  have  no  neighboring  nucleus 
within  a  certain  distance),  (7)  number  of  nuclei  located  “far”  from  lumens,  and  (8)  entropy  of 


spatial  distribution  of  nuclei  (Figure  6G).  Supplementary  Materials  provide  specifics  of  these 
measures  and  their  calculation. 


3.2.  Lumen-related  features 

Features  describing  glands  have  been  shown  to  be  effective  in  PCa  classification.  Here,  we  try  to 
characterize  lumens  and  mostly  focus  on  the  differences  in  the  shape  of  the  lumens.  The 
quantities  we  measure  in  defining  these  features  are:  (1)  size  of  a  lumen,  (2)  number  of  lumens, 


(3)  lumen  “roundness”,  defined  as 


2  L 


r  where  L  .  is  the  perimeter  of  the  lumen,  Larea 


is  the 


size  of  the  lumen,  and  r  is  the  radius  of  a  circle  of  size  Larea ,  (4)  lumen  “distortion”  (Figure  6A), 


computed  as 
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where  dL  is  the  distance  from  the  center  of  a  lumen  to  the  boundary  of 


the  lumen  and  AVG(-)  and  STD(-)  represent  the  average  and  standard  deviation,  (5)  lumen 
“minimum  bounding  circle  ratio”  (Figure  6B),  defined  as  the  ratio  of  the  size  of  a  minimum 
bounding  circle  of  a  lumen  to  the  size  of  the  lumen,  (6)  lumen  “convex  hull  ratio”  (Figure  6C), 
which  is  the  ratio  of  the  size  of  a  convex  hull  of  a  lumen  to  the  size  of  the  lumen,  (7)  symmetric 
index  of  lumen  boundary  (Figure  6E,  see  Supplementary  Materials),  (8)  symmetric  index  of 
lumen  area  (Figure  6F,  see  Supplementary  Materials),  and  (9)  spatial  association  of  lumens  and 
cytoplasm-rich  regions  (Figure  6D,  see  Supplementary  Materials).  Features  (3)  -  (8)  are  various 
ways  to  summarize  lumen  shapes,  while  feature  (9)  is  motivated  by  the  loss  of  functional 
polarization  of  epithelial  cells  in  cancerous  tissues. 


3.3.  Global  &  local  tissue  features 

We  have  described  above  the  individual  measures  of  epithelium  and  lumen  related  quantities  that 
form  the  basis  of  the  features  used  by  our  classification  system.  Normally,  these  features  have  to 
be  summary  measures  over  the  entire  tissue  sample  or  desired  classification  area.  Hence,  we 
employ  average  (AVG)  or  standard  deviation  (STD),  and  in  some  cases  the  sum  total  (TOT)  of 
these  quantities  for  further  analysis.  These  features  are  called  “global”  features  since  they  are 
calculated  from  the  entire  sample.  However,  in  some  cases  global  features  may  be  misleading, 
especially  where  only  a  part  of  the  tissue  sample  is  indicative  of  cancer.  Therefore,  in  addition  to 
global  features,  we  define  “local”  features  by  sliding  a  rectangular  window  of  a  fixed  size 
(typically  100x100  pixels)  throughout  a  tissue  sample,  computing  the  average  or  sum  total  of  the 
feature  in  each  window,  and  computing  the  standard  deviation  and/or  extrema  over  the  values  for 
all  windows  (Figure  7).  In  all,  67  features  (29  global  and  38  local  features)  are  defined  capturing 
various  aspects  of  tissue  morphology. 


4.  Feature  Selection 

Feature  selection  is  the  step  where  the  classifier  examines  all  available  features  (67  in  our  case) 
with  respect  to  the  training  samples,  and  selects  a  subset  to  use  on  test  data.  This  selection  is 
generally  based  on  the  criterion  of  high  accuracy  on  training  data,  but  also  strives  to  ensure 
generalizability  beyond  the  training  data.  We  adopt  a  two-stage  feature  selection  approach  here. 
In  the  first  stage,  we  generate  a  set  of  candidate  features  (C candidate)  by  using  the  so-called 
minimum-redundancy-maximal-relevance  (mRMR)  criterion.  In  each  iteration,  given  a  feature 
set  chosen  thus  far,  mRMR  chooses  the  single  additional  feature  that  is  least  redundant  with  the 
chosen  features,  while  being  highly  correlated  with  the  class  label.  Ccan(ndate  is  a  set  of  features 


that  is  expected  to  be  close  to  the  optimal  feature  set  for  a  dataset  and  a  classifier  under 
consideration.  It  is  constructed  as  follows.  Given  a  feature  set  F  =  (f,  ...,  fM)  ordered  by  mRMR, 
AUC  of  the  set  of  i  top-ranked  features  is  computed  for  varying  values  of  i.  We  limit  the  value  of 
i  to  be  <  30.  The  feature  subset  with  the  best  AUC  is  chosen  as  the  Candidate-  In  the  second  stage, 
feature  selection  continues  with  Candidate  as  the  starting  point,  using  the  sequential  floating 
forward  selection  (SFFS)  method.  This  method  sequentially  adds  new  features  followed  by 
conditional  deletion(s)  of  already  selected  features.  Starting  with  the  Candidate,  SFFS  searches  for 
a  feature  x  0  Cca„didate  that  maximizes  the  AUC  among  all  feature  sets  Candidate  u  U),  and  adds  it 
to  C candidate-  Then,  it  finds  a  feature  a:  e  Candidate  that  maximizes  the  AUC  among  all  feature  sets 
C candidate  -  U} .  If  the  removal  of  x  improves  the  highest  AUC  obtained  by  Candidate,  x  is  deleted 
from  C candidate-  As  long  as  this  removal  improves  upon  the  highest  AUC  obtained  so  far,  the 
removal  step  is  repeated.  SFFS  repeats  the  addition  and  removal  steps  until  AUC  reaches  1.0  or 
the  number  of  additions  and  deletions  exceeds  20,  and  the  feature  set  with  the  highest  AUC  thus 
far  is  chosen  as  the  optimal  feature  set.  The  classification  capability  of  a  feature  set,  required  for 
feature  selection,  is  measured  by  the  area  under  the  ROC  curve  (AUC),  obtained  by  cross- 
validation  on  the  training  set. 

5.  Classification 

We  note  that  there  are  two  levels  of  classification  here.  In  the  first,  IR  spectral  data  is  used  to 
provide  histologic  images  where  each  pixel  has  been  classified  as  a  cell  type.  In  the  second,  the 
measures  from  H&E  images  and  IR  images  are  used  to  classify  tissue  into  disease  states.  In  this 
manuscript,  we  do  not  discuss  the  first  classification  task  as  its  development  and  results  are  well- 
documented.  For  the  latter  task,  we  used  a  well  established  classification  algorithm,  namely 
support  vector  machine  (SVM).  Two  cost  factors  are  introduced  to  deal  with  an  imbalance  in 
training  data.  The  ratio  between  two  cost  functions  was  chosen  as 
C+  _  number  of  negative  training  examples 
C  number  of  positive  training  examples 

to  make  the  potential  total  cost  of  the  false  positives  and  the  false  negatives  the  same.  (See 
Supplementary  Materials  for  details.) 

6.  Data  preparation 

All  of  the  H&E  stained  images  were  acquired  on  a  standard  optical  microscope  at  40x 
magnification.  The  size  of  each  pixel  is  0.9636  um  x  0.9636  um.  On  the  other  hand,  the  pixel 
size  of  IR  images  is  6.25um  x  6.25um.  The  acquisition  was  previously  described  in  previous 
years’  reports.  Two  data  sets,  stained  under  different  conditions,  were  used  in  this  study.  The 
first  dataset  (“Datal”)  consists  of  66  benign  samples  and  115  cancer  samples,  and  the  second  set 
(“Data2”)  includes  14  benign  and  36  cancer  samples.  These  were  previously  acquired  under  the 
grant. 

Results  and  discussion:  We  then  applied  the  methods  to  classify  prostate  tissue  and  the  results 
are  presented  below. 

1.  The  classification  system  achieves  AUC  greater  than  0.97  on  both  data  sets 

We  first  performed  &-fold  cross  validation  on  each  dataset.  The  data  set  was  divided  into  K 
roughly  equal-sized  partitions,  one  partition  was  left  out  as  the  “test  data”,  the  classifier  was 


trained  on  the  union  of  the  remaining  K  -  1  partitions  (the  “training  data”)  and  evaluated  on  the 
test  data.  This  was  repeated  K  times,  with  different  choices  of  the  left-out  partition.  (We  set  K  = 
10.)  In  each  repetition,  cross-validation  on  the  training  data  was  used  to  select  the  feature  set 
with  the  highest  AUC  as  explained  in  §4.  The  correct  and  incorrect  predictions  in  the  test  data, 
across  all  K  repetitions,  were  summarized  into  a  ROC  plot  and  the  AUC  was  computed,  along 
with  specificities  when  sensitivity  equals  90,  95,  or  99%.  Since  the  cross-validation  exercise 
makes  random  choices  in  partitioning  the  data  set,  we  examined  averages  of  these  performance 
metrics  over  10  repeats  of  the  entire  cross  validation  pipeline.  The  average  AUC  for  Datal  and 
Data2  were  0.982  and  0.974  respectively  (Table  1,  “feature  extraction”  =  “IR  &  HE”).  At  90%, 
95%,  and  99%  sensitivities,  the  average  specificity  achieved  on  Datal  was  94.76%,  90.91%,  and 
77.80%  respectively,  while  that  on  Data2  was  92.53%,  84.19%,  and  49.54%  respectively. 

One  way  to  interpret  the  above  values  is  to  examine  our  automated  pipeline  as  a  pre-screening 
mechanism  to  identify  the  samples  to  be  examined  by  a  human  pathologist.  At  a  “true  positive 
rate”  of  99%  (which  means  that  only  1%  of  the  cancer  samples  will  be  missed  by  the  screen),  the 
“false  positive  rate”  is  22.2%  (i.e.,  22.2%  of  the  benign  samples  will  make  it  through  the  screen) 
on  average  for  Datal  (Tablel),  thereby  reducing  the  workload  of  the  pathologist  by  4.5-fold. 
While  the  error  rate  of  manual  pathology  determinations  is  generally  accepted  to  be  in  1-5% 
range,  inclusion  of  confounding  cancer  mimickers  raises  the  rate  to  as  high  as  7.5%.  Also 
noteworthy  is  the  observation  that  the  same  algorithm  performs  consistently  well  on  both  data 
sets,  that  were  obtained  from  different  staining  conditions.  This  speaks  to  the  robustness  of  the 
classification  framework,  an  attribute  that  we  investigated  further  in  the  next  exercise. 

2.  Classification  system  is  robust  to  staining  conditions 

Here,  we  trained  a  classifier  on  Datal  and  tested  its  performance  on  Data2.  We  observed  an 
average  AUC  of  0.956,  with  average  specificity  of  88.57%,  81.92%,  and  26.86%  at  sensitivity 
equaling  90%,  95%,  and  99%  respectively  (Table  2,  “feature  extraction”  =  “IR  &  HE”).  These 
values  are  competitive  with  the  cross-validation  results  on  Data2  (Table  1),  where  the  training 
and  testing  were  both  perfonned  on  (disjoint  parts  of)  Data2. 

3.  IR  data  is  critical  to  classification  performance 

To  assess  the  utility  of  the  IR-based  cell-type  classification,  we  repeated  the  above  exercises 
after  extracting  features  without  the  guidance  of  the  IR  data;  i.e.,  epithelial  cells  were  predicted 
from  the  H&E  images  alone  (see  Supplementary  Materials  for  details).  All  of  the  features 
defined  in  §3  were  used,  except  for  “Spatial  association  of  lumens  and  apical  regions”,  since  the 
distinction  between  cytoplasm-rich  and  nuclear-rich  region  in  epithelial  cells  was  unclear  in 
H&E  images.  The  results  from  this  disadvantaged  classifier  are  shown  in  Tables  1  and  2 
(“feature  extraction”  =  “HE  only”).  For  both  types  of  experiments,  we  obtained  lower  average 
AUCs  and  specificity  values.  For  instance,  the  AUC  of  cross-validation  in  Data2  (Table  1) 
dropped  from  0.974  to  0.880.  Similarly,  the  results  of  validation  between  datasets  (Table  2)  were 
substantially  worse  now  compared  to  the  IR-guided  classification,  with  the  AUC  dropping  from 
0.956  to  0.918.  This  indicates  that  feature  extraction  with  the  help  of  the  IR  cell-type 
classification  is  critical  to  consistent  and  reliable  classification  of  cancer  versus  benign  tissue 
samples.  _ _ _ _ _ 


Dataset 

Feature 

AUC 

Sensitivity 

Specificity  (%) 

Mf 

Extraction 

AVG 

STD 

(%) 

AVG 

STD 

Datal 

IR  &  HE 

0.982 

0.0030 

90 

94.76 

1.64 

13 

95 

90.91 

1.62 

99 

77.80 

5.52 

HE  only 

0.968 

0.0052 

90 

91.64 

2.26 

11 

95 

83.90 

1.91 

99 

53.43 

13.65 

Data2 

IR  &  HE 

0.974 

0.0145 

90 

92.53 

7.11 

7 

95 

84.19 

10.84 

99 

49.54 

22.51 

HE  only 

0.880 

0.0175 

90 

61.34 

10.31 

8 

95 

22.21 

10.06 

99 

11.21 

6.01 

Table  1 .  Classification  results  via  cross-validation. 


AVG  and  STD  denote  average  and  standard  deviation  across  ten  repeats  of  cross-valdiation.  Mf 
is  the  median  size  of  the  feature  set  obtained  by  feature  selection  from  training  data.  Column 
“Feature  Extraction”  indicates  if  features  were  obtained  using  H&E  as  well  as  IR  data,  or  with 
H&E  data  alone. 


Feature 

Dataset 

AUC 

Sensitivity 

Specificity  (%) 

Mf 

Extraction 

AVG 

STD 

(%) 

AVG 

STD 

90 

98.30 

0.68 

Train 

0.994 

0.0006 

95 

96.58 

1.10 

IR  &  HE 

99 

91.55 

2.55 

13 

90 

88.57 

5.96 

Test 

0.956 

0.0089 

95 

81.92 

5.28 

99 

26.86 

15.50 

90 

97.77 

0.97 

Train 

0.986 

0.0021 

95 

91.56 

2.49 

HE  only 

99 

79.29 

4.47 

10 

90 

65.51 

8.37 

Test 

0.918 

0.0100 

95 

46.14 

7.53 

99 

13.29 

6.94 

Table  2.  Va 

idation  between  datasets. 

A  classifier  is  trained  on  Datal  and  tested  on  Data2.  AVG  and  STD  denote  the  average  and 
standard  deviation.  Mf  is  the  median  size  of  the  optimal  feature  set.  Column  “Feature  Extraction” 
indicates  if  features  were  obtained  using  FI&E  as  well  as  IR  data,  or  with  H&E  data  alone. 
Column  “Dataset”  indicates  if  the  performance  metrics  are  from  training  data  (Datal)  or  from 
test  data  (Datal). 

Previously,  Tabeshi  et  al.  achieved  an  accuracy  of  96.7%  via  cross  validation  in  cancer/no¬ 
cancer  classification.  Color,  morphometric,  and  texture  features  were  extracted,  and  all  images 
were  acquired  under  similar  conditions.  We  note  that  our  classification  result  (Table  1),  based 
solely  on  morphology,  is  comparable  to  their  result;  however  the  software  developed  by  Tabeshi 
et  al.  was  not  available  for  evaluation  in  our  data  sets.  Color  and  texture  features  could  provide 
additional  information;  however,  their  robustness  to  different  data  sets  is  questionable,  and  their 


interpretation  is  not  as  obvious  as  that  of  morphological  features,  which  are  used  in  clinical 
practice.  Different  data  sets  may  have  varied  properties  which  may  be  attributable  to  staining 
variations,  inconsistent  image  acquisition  settings,  and  image  preparation.  The  performance  of 
the  same  method  based  on  texture  features  has  been  seen  to  greatly  change  from  one  data  set  to 
another.  Variations  in  staining  may  affect  color  features.  In  contrast,  morphological  features 
were  shown  to  be  robust  to  varying  image  acquisition  settings.  Nonetheless,  the  quality  of 
morphological  features  is  subject  to  segmentation  of  histologic  objects.  Thus,  any  method  based 
on  morphological  features  will  benefit  from  the  IR  cell-type  classification. 


Number  of  Lumen  =  17 
Number  of  Nuclei  =  755 


Figure  9.  Global  and  Local  Feature  Extraction.  Global  features  are  extracted  from  the 
entire  tissue  sample,  and  local  features  are  extracted  by  sliding  a  window  of  a  fixed  size 
across  the  tissue  sample  and  computing  summary  statistics,  such  as  standard  deviation,  of 
window-specific  scores.  In  this  example,  the  global  feature  “number  of  nuclei”  has  value 
755,  while  one  example  position  of  the  sliding  window  is  shown,  with  “number  of  nuclei”  = 
29. 


4.  Examination  of  discriminative  features 

We  examined  the  importance  of  each  feature  by  its  rank  in  the  first  phase  of  feature  selection, 
based  on  its  “relevance”  to  the  class  label  (see  Supplementary  Materials,  mRMR).  Since 
different  features  (e.g.,  average  or  standard  deviation,  global  or  local  features)  based  on  the  same 
underlying  quantity  (e.g.,  “lumen  roundness”)  generally  have  similar  relevance,  we  examined  the 
average  relevance  of  features  in  each  of  17  feature  categories  (Figure  8),  for  each  data  set.  The 
complete  list  of  the  individual  features  and  their  relevance  and  mRMR  rank  (for  Datal )  is 
available  in  Figure  9.  For  Datal,  lumen-related  feature  categories  are  most  relevant  in  general, 
while  epithelium-related  feature  categories  are  most  important  for  Data2.  It  is  surprising  that  the 
top  3  feature  categories  in  Datal  (Figure  8,  blue  bars)  -  size  of  lumen,  lumen  roundness,  and 


lumen  convex  hull  ratio  -  have  very  low  relevance  in  Data2,  although  we  note  that  this  may  be 
in  large  part  due  to  variations  in  staining  and  malignancy  of  tumors  between  the  two  data  sets. 
Also,  examining  the  features  (or  feature  categories)  with  highest  relevance  alone  may  be  slightly 
misleading,  because  this  examination  does  not  account  for  redundancy  among  features. 


Figure  10.  Importance  of  17  feature  categories.  The  average  “maximal  relevance”  of 
features  belonging  to  each  feature  category  is  shown,  for  both  data  sets,  sorted  in 
decreasing  order  for  the  first  data  set. 
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Figure  11.  List  of  features  and  their  maximal  relevance  and  “mRMR  rank”.  In  the  second 
column,  G  and  L  represent  global  and  local  features,  respectively.  AVG,  STD,  TOT,  and 


MAX  denote  the  average,  standard  deviation,  total  amount,  and  extremal  value  of  features. 
*  In  computing  local  features  representing  “size  of  lumen”,  two  options  are  available:  one 
is  to  consider  only  the  part  of  the  lumen  within  the  window,  and  the  other  is  to  consider  the 
entire  lumen  into  account.  Asterisk  indicates  that  the  former  option  was  chosen. 


Figure  12.  Optimal  features  for  distinguishing  cancer  and  benign  tissue  samples.  The  four 
features  shown  here  are  always  present  in  the  optimal  feature  set  chosen  by  the  classifier. 


Conclusions 

In  completing  this  task,  we  have  presented  a  means  to  eliminate  epithelium  recognition 
deficiencies  in  classifying  H&E  images  for  presence  or  absence  of  cancer.  The  method  is  entirely 
transparent  to  a  user  and  does  not  involve  any  adjustment  or  decision-making  based  on  spectral 
data.  We  were  able  to  achieve  very  effective  fusion  of  the  information  from  two  different 
modalities,  namely  optical  and  IR  microscopy,  that  provide  very  different  types  of  data  with 
different  characteristics.  Several  features  of  the  tissue  were  quantified  and  employed  for 
classification.  We  found  that  robust  classification  could  be  achieved  using  a  few  measures, 
which  are  detailed  to  arise  from  epithelial/lumen  organization  and  provide  a  reasonable 
explanation  for  the  accuracy  of  the  model.  The  choice  of  combining  the  IR  and  optical  data  is 
shown  to  be  necessary  for  achieving  the  high  accuracy  values  observed.  We  anticipate  that  the 
combined  use  of  the  two  microscopies  -  structural  and  chemical  -  will  lead  to  an  accurate,  robust 
and  automated  method  for  determining  cancer  within  biopsy  specimens. 


TASK  2F:  DEVELOP  CALIBRATION  FOR  PREDICTING  CANCER  GRADE 
(MONTHS  18-22) 


Motivation: 

Quality  assurance  in  clinical  pathology  plays  a  critical  role  in  the  management  of  patients  with 
prostate  cancer  as  pathology  is  the  gold  standard  of  diagnosis  and  forms  a  cornerstone  of  patient 
therapy.  Methods  to  integrate  quality  development,  quality  maintenance,  and  quality 
improvement  to  ensure  accurate  and  consistent  test  results  are,  hence,  critical  to  cancer 
management  in  any  setting.  These  factors  have  a  direct  bearing  on  patient  outcomes,  financial 
aspects  of  disease  management  as  well  as  malpractice  concerns.  One  of  the  major  failings  in 
prostate  pathology  today  is  the  rate  of  missed  tumors  and  variability  in  grading.  It  is  well  known 
that  the  grading  of  prostate  tissues  suffers  from  intra-  and  inter-pathologist  variability.  In  the 
studies  of  intra-  and  inter-pathologist  reproducibility,  the  exact  intra-pathologist  agreement  was 
achieved  in  43-78%  of  the  instances,  and  in  36-81%  of  the  instances,  the  exact  inter-pathologist 
agreement  was  reported.  It  is  also  known  that  the  variability  of  the  grading  could  be  reduced 
after  pathologists  are  re -trained.  There  could  be  many  ways  to  educate  pathologists  such  as 
meetings,  courses,  online  tutorials,  and  etc,  but  these  are  not  time-  and  cost-effective  for  routine 
everyday  decisions.  Therefore,  building  an  automated,  fast,  and  objective  method  to  aid 
pathologists  to  examine  prostate  tissues  will  greatly  help  to  attain  reliable  and  consistent 
diagnoses.  This  will  reduce  healthcare  costs  and  the  chances  of  malpractice  lawsuits  as  well  as 
improve  patient  outcomes  in  therapy. 

Innovation  in  our  approach  and  potential  benefits: 

When  a  pathologist  examines  tissue,  he/she  looks  at  a  stained  imaged  of  tissue  and  mentally 
compares  it  against  a  database  of  previous  knowledge  or  information  in  books.  In  essence,  the 
pathologist  is  manually  matching  structural  patterns  he/she  has  seen  earlier  and  mentally 
recalling  the  diagnosis  made  such  that  he/she  can  make  the  same  diagnosis  in  the  specific  test 
case.  Here,  we  report  developing  a  computer  information  and  management  and  decision-making 
system  that  relies  of  one  or  more  measures  of  the  structure  of  tissue  to  provide  images  from  a 
database  that  are  similar  to  the  sample  under  consideration.  We  emphasize  that  the  system  does 
not  provide  a  diagnosis  but  simply  provides  the  closest  matching  cases  that  enable  a  pathologist 
to  make  a  diagnosis.  We  also  propose  here  the  new  idea  of  constructing  a  database  of  pre¬ 
examined  prostate  tissues  and  providing  similar  tissue  samples  with  pathologists  from  the 
database  while  they  examine  an  unknown  tissue  sample.  To  our  knowledge,  no  such  system 
currently  exists.  Further,  we  propose  that  our  system  may  or  may  not  use  infrared  chemical 
imaging  data  in  comparisons.  Comparing  with  the  pre-examined  tissues  samples,  we  expect  that 
pathologists  to  make  more  consistent  and  accurate  decision.  As  we  build  a  database  of  prostate 
tissue  samples,  we  represent  each  tissue  sample  by  its  morphology.  Given  an  unknown  tissue 
sample,  the  similarities  between  the  unknown  sample  and  the  tissue  samples  in  the  database  are 
measured  based  on  the  morphological  properties,  and  the  most  similar  tissue  samples  are 
retrieved.  The  pathologist  may  indicate  that  certain  matches  were  better  than  others,  resulting  in 
an  updating  of  the  database  and  matching  algorithms  as  needed.  The  updating  may  be  conducted 
in  real-time. 

Work  accomplished: 

Morphological  features  have  been  shown  to  be  able  to  characterize  prostate  tissues  and  can  be 
used  for  the  diagnostic  purpose.  Here,  67  morphological  features,  which  are  based  on  lumens  and 
epithelial  nuclei,  were  extracted  from  each  tissue  sample.  The  database  stores  the  morphological 
features  for  the  tissue  samples  which  have  already  been  examined  by  pathologists. 


Once  we  have  an  unknown  prostate  tissue  sample  (query),  first  of  all,  the  morphological  features 
are  extracted  from  the  tissue  sample.  Secondly,  the  similarities  between  the  query  and  the  tissue 
samples  in  the  database  are  computed  using  Euclidean  distance  based  on  the  morphological 
features.  Lastly,  the  most  similar  k  tissue  samples  to  the  query  are  retrieved  from  the  database. 

To  assess  the  goodness  of  the  method,  we  have  tested  our  method  on  a  dataset  composed  of  181 
tissue  samples.  In  the  dataset,  5,  23,  66,  and  21  tissue  samples  are  Gleason  grade  2,  3,  4,  and  5 
cancer  (“ Cancer ”),  respectively,  and  20  and  46  tissue  samples  are  BPH  and  normal  (“Benign”), 
respectively.  Due  to  the  small  number  of  tissue  samples,  Gleason  grade  2  is  ignored  for  the 
further  consideration.  As  mentioned  above,  each  of  tissue  samples  is  represented  by  67 
morphological  features. 

In  order  to  measure  the  performance  of  the  method,  we  adopted  k-nearest  neighbor  (kNN) 
algorithm  and  predicted  the  grade  of  the  query  by  majority  voting.  Both  accuracy  and  kappa- 
coefficient  were  computed  for  the  predictions.  Since  pathologists  may  be  more  interested  in 
grading  of  cancerous  tissue  samples,  we  also  applied  our  method  only  to  the  “Cancer”  tissue 
samples;  i.e.,  Gleason  grade  3,  4,  and  5  samples. 

We  performed  Leave-one-out  (LOO)  cross-validation  on  the  dataset.  LOO  leaves  one  example  as 
a  validation  data  and  uses  the  remaining  examples  as  training  data.  In  our  method,  the  validation 
data  is  the  query,  and  the  training  data  is  regarded  as  the  database.  It  should  be  noted  that  the 
number  of  tissue  samples  in  each  grade  in  the  dataset  varies.  The  imbalance  in  the  dataset  could 
affect  the  prediction  made  by  kNN  algorithm.  To  tackle  the  problem,  we  randomly  selected  the 
same  number  of  tissue  samples  from  each  grade  and  performed  LOO  on  the  sub-dataset.  This 
repeated  100  times,  and  the  average  accuracy  and  kappa-coefficient  were  computed  over  the 
repeats. 

Our  method  is  subject  to  the  choice  of  the  number  of  nearest  neighbors  to  consider  for  the 
prediction  and  the  number  of  features  to  use  for  the  similarity  computation.  To  examine  the 
effect  of  them,  we  computed  the  average  accuracy  and  kappa-coefficient  over  100  repeats  as 
increasing  the  two  factors  (Lig.  1).  The  accuracy  decreases  as  increasing  the  number  of  nearest 
neighbors,  and  the  more  features  we  use,  the  higher  accuracy  achieved.  The  highest  average 
accuracy  achieved  for  grading  both  “ Cancer ”  and  “ Benign  ”  samples  (i.e,  5  grades)  was  42% 
using  7  features  and  1  nearest  neighbor  (Lig.  la).  By  using  8  features  and  1  nearest  neighbor,  the 
highest  accuracy  of  52%  achieved  for  grading  only  “ Cancer ”  samples  (i.e.,  3  grades)  (Lig.  lc). 
Both  cases  also  achieved  the  average  Kappa  coefficient  of  0.27  (Lig.  lb,  d).  In  Lig.  2,  the 
distribution  of  the  grade  of  the  retrieved  samples  is  shown.  Distinction  between  “Cancer”  and 
“ Benign ”  samples  is  obvious  (Lig  2a),  but  among  “Cancer”,  the  retrieved  samples  often  do  not 
belong  to  the  same  grade  with  the  query,  especially  between  Gleason  grade  3  and  4. 
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Figure  13.  Average  accuracy  and  kappa  coefficient,  (a),  (b)  grading  for  both  “ Cancer ”  and 
“ Benign ”  samples,  (c),  (d)  grading  for  “ Cancer ”  samples.  Each  line  depicts  the  accuracy 
and  kappa  coefficient  values  of  the  corresponding  number  of  features. 


Query:  Grade  3 


Query:  Grade  4 


Query:  Grade  5 


Query:  Grade  8 


Grade  3  Grade  4  Grade  5  BPH  Normal 


(a) 


Query:  Grade  3 


Grade  3 


o  _ 
00  - 


Grade  3 


Grade  4  Grade  5 

Query:  Grade  5 


4 


Grade  4  Grade  5 


Query:  Grade  6 


o  _ 

<o  - 
©  - 
V  - 

r*  - 

o  J  - 

Grade  3 


Grade  4 


Query:  Grade  4 
A 


o 


a  -J  -  -  - 

Grade  3  Grade  4  Grade  5 


(b) 

Figure  14.  Distribution  of  the  grade  of  the  retrieved  samples,  (a)  grading  for  both  “ Cancer  ” 
and  “ Benign ”  samples,  (b)  grading  for  “ Cancer ”  samples.  For  the  samples  in  each  grade, 


the  grade  of  retrieved  samples  are  counted  and  the  average  number  of  samples  are  shown. 
The  arrows  denote  ±1  standard  deviation  of  the  number  of  samples. 


Task:  Develop  protocols  and  validate  Gleason  grading  of  tumor  (months  18-27) 

The  task  above  provides  details  of  the  development  and  LOO  validation.  More  rigorous 
validations  are  needed  but  the  preliminary  results  shows  here  have  been  used  to  validate  the 
grading  correspondence  and  the  protocols  we  have  developed,  as  noted  above. 

It  is  important  to  place  the  magnitude  of  our  advance  in  context.  Several  research  efforts  have 
been  made  to  develop  automated  systems  for  the  grading  of  prostate  tissues.  The  majority  of 
systems  have  been  used  texture  and/or  morphological  features  to  characterize  and  classify  tissue 
samples  into  correct  classes.  However,  the  information  which  pathologists  will  obtain  by  using 
such  methods  may  be  limited  since  these  only  provide  the  predicted  grade  in  general.  The 
prediction  also  relies  on  the  training  data.  Most  importantly,  these  prior  efforts  always  sought  to 
match  a  sample  completely  to  provide  a  diagnosis,  rather  than  provide  matching  candidates. 
Further,  the  role  of  other  modalities  in  the  process  was  not  clear.  Here,  we  may  also  use  IR 
chemical  imaging  data  in  matching.  Our  premise  is  that  tissue  samples  which  have  the  same 
grade  and  similar  characteristics  and  patterns  with  the  sample  of  interest  will  afford  more 
information  to  pathologists  and  hence,  the  system  enables  a  matching  to  a  database  rather  than 
seeking  to  provide  an  unequivocal  diagnosis. 

Future  outlook  enabled  by  this  progress: 

The  matching  system  would  be  implemented  first  for  a  clinical  trial  and  then,  would  be  ready  for 
commercial  translation.  While  a  true  clinical  trial  is  the  next  step,  some  further  development  of 
the  actual  methods  may  be  expected.  We  have  built  the  method  into  existing  software  as  a  user- 
friendly  software. 


Task  3.  Develop  mathematical  framework  to  correlate  spectral,  spatial  and  clinical 

parameters  with  cancer  progression 

a.  Identify  and  validate  spectral  metrics  and  develop  spatial  metrics  indicative  of  tumor 
grade  (Months  27-30) 

b.  Develop  prediction  algorithm  for  predicting  outcome  (Months  30-36) 

Activities: 

We  have  imaged  460  patients  with  full  outcome  data  and  identified  several  metrics  that  are 
indicative  of  tumor  grade  (please  see  task  2  as  well).  A  mathematical  framework  for  correlate  the 
spectral,  spatial  and  clinical  parameters  with  cancer  progression  has  been  built  using  logistic 
regression.  The  prediction  algorithm  is  available  for  use  and  will  be  validated.  The  task  for  this 
project  was  to  develop  the  algorithms  and,  hence,  the  task  is  complete. 

Task  3.  Develop  mathematical  framework  to  correlate  spectral,  spatial  and  clinical 

parameters  with  cancer  progression 

Goal:  The  goal  of  this  task  was  to  evaluate  tissue  with  a  view  to  predict  outcome.  The  emphasis 
especially  was  on  spectral  features  (metrics)  that  could  be  used. 


a.  Identify  and  validate  spectral  metrics  and  develop  spatial  metrics  indicative  of  tumor 
grade  (Months  27-30) 

b.  Develop  prediction  algorithm  for  predicting  outcome  (Months  30-36) 

Activities: 

We  obtained  a  set  of  460  samples  in  which  patients  were  matched  for  age,  PSA,  grade  and  stage. 
Prostate  cancer  within  half  of  the  patients  recurred  within  5  years  while  the  others  were 
recurrence-free  at  10  years.  Samples  from  the  entire  data  set  were  imaged.  We  then  developed 
spectral  metrics  to  characterize  the  samples.  The  relatively  simple  methods  developed  previously 
were  not  able  to  segment  tissue  into  lethal  (recur  within  5  years)  from  indolent  disease  (does  not 
recur  in  5  years).  Hence,  we  examined  two  avenues.  In  the  first,  we  sought  to  examine  if  sources 
of  variance  were  clouding  the  segmentation  ability  of  the  methods.  Knowledge  of  the  various 
sources  of  variance  and  their  effects  would  help  determine  the  mathematical  model  to  be  used.  In 
the  second,  we  sought  to  examine  a  major  issue  in  the  development  of  biomarkers  -  namely  the 
difference  in  data  sets  and  detennine  if  the  differences  between  data  sets  could  be  repaired  by 
computational  methods.  We  describe  the  first  study  first  and  the  second  one  later. 

Analysis  of  Variance  in  IR  imaging  data  from  prostate  tissue 

Most  biomedical  samples,  including  cells  and  tissue  samples  encountered  in  the  prostate,  are 
chemically  complex  and  simple  molecular  compositions  cannot  be  obtained.  Hence,  the  analysis 
of  complex  tissues  often  relies  on  treating  the  IR  spectrum  as  a  signature  of  the  identity  and 
physiologic  state.  Many  studies  seek  to  find  the  spectral  differences  between  given  classes  of 
samples  from  a  statistical,  rather  than  purely  biochemical,  perspective.  These  classes  of  samples 
may  be  different  grades  of  disease  or  benign  tissue,  for  example.  Finding  an  analytical  technique 
that  can  distinguish  between  disease  states  is  of  tremendous  technological  and  medical 
importance  as  it  can  potentially  aid  clinicians  and  help  prevent  errors.  IR  imaging  can  potentially 
provide  a  solution  by  correlating  spectral  or  spatial  features  with  disease  states.  When  suitable 
correlations  between  spectral  differences  and  classes  are  found,  a  protocol  may  be  constructed 
that  allows  for  detection  of  these  disease  states  in  a  practical  application.  Though  conceptually 
straightforward,  this  approach  is  exceptionally  challenging  not  only  because  of  the  subtle 
differences  between  various  components  and  disease  states  in  tissue  but  also  because  of  the 
variation  in  IR  spectra  that  may  arise  due  to  other  factors  and  obscure  differences  between 
disease  states.  This  variation  in  spectral  differences  overwhelming  differences  due  to  disease 
states  is  likely  a  primary  cause  for  the  failure  of  many  analytical  methods  in  providing  robust 
protocols.  Finally,  the  sample  population  under  consideration  may  be  of  limited  size,  raising 
statistical  issues  in  analyses  and  inferences.  The  analyses  could  be  biased  as  the  given  samples 
may  not  be  representative  of  the  entire  population.  The  latter  two  considerations  can  be 
addressed  by  careful  study  design  and  subsequent  analysis.  The  question  of  analytic  variability 
remains  to  be  resolved  and  is  a  topic  of  much  interest  in  infrared  spectroscopy  and  other 
analytical  technologies 

Analytic  variability  can  arise  from  (a)  noise  in  signal  measurement,  (b)  from  differences  within 
the  tissue  that  leads  to  differences  both  within  a  given  sample  and  between  samples  from  the 
same  patient,  (c)  differences  between  patients  due  to  biologic  diversity,  (d)  differences  due  to 
sample  handling  in  different  clinical  settings  or  research  groups  and  (e)  due  to  causes  not  falling 
into  any  of  the  above  categories.  The  variation  may  also  be  understood  to  be  biological,  technical 
or  residual.  Biological  variation  arises  from  different  biological  characteristics  of  samples  such 


as  patients,  tissues,  cells,  subcellular  components,  etc.  It  is  natural  and  expected  variation,  and 
often  of  interest  in  an  experiment.  Technical  variation  is  attributable  to  both  sample  preparation 
and  FT-IR  imaging  techniques.  Potential  sources  of  technical  variation  include  tissue  acquisition, 
fixation,  and  sectioning,  placement  of  tissue  section  on  the  slide  and  post -preparation  handling. 
The  very  process  of  data  acquisition  also  introduces  variation,  such  as  measurement  noise. 
Minimizing  technical  variation  ensures  data  of  high  quality.  Residual  variation  refers  to  the 
unexplained  variation  in  the  experiment;  for  example,  environmental  conditions  -  room 
temperature  and  humidity  -  that  may  not  be  part  of  the  sample  or  acquisition  characteristics. 
Although  thoroughly  identified,  these  potential  sources  of  variation  may  never  be  complete. 
Accordingly,  residual  variation  will  be  present  and,  on  occasion,  can  have  a  substantial  impact 
on  the  analyses.  In  such  a  case,  we  may  either  re-identify  potential  sources  of  variation  or  re¬ 
design  the  experiment. 

Understanding  the  relative  importance  of  each  of  these  factors  and  explaining  the  variance 
observed  in  large  scale  tissue  studies  is  critical  for  developing  any  real-world  application.  While 
an  understanding  of  the  contributions  of  variance  by  various  sources  can  result  in  improved 
protocol  designs,  the  lack  of  such  understanding  brings  into  question  the  perfonnance  of  any 
developed  protocol.  Hence,  in  this  manuscript,  we  develop  a  framework  to  understand  analytic 
variability  and  its  sources  in  infrared  spectroscopic  imaging  of  tissue.  This  understanding  may  be 
extended  to  other  analytical  techniques  and  imaging  modalities,  in  general,  and  may  be  used  to 
improve  the  practice  of  IR  spectroscopic  imaging  for  biomedical  analysis  in  particular.  The  first 
challenge  to  understanding  variability  is  to  obtain  a  data  set  of  sufficient  diversity  and  size. 
Tissue  microarrays  (TMAs),  to  this  end,  are  an  excellent  tool  and  have  been  used  previously  in  a 
number  of  studies.  TMAs  consist  of  many  samples  of  tissue  arranged  in  a  grid  pattern.  Multiple 
samples  are  usually  included  from  the  same  person,  a  population  of  different  people  and,  often, 
from  different  clinical  settings  is  includes.  Multiple  TMAs  may  further  be  employed  to  increase 
sample  set  diversity  and  size.  The  effect  of  the  various  sources  of  variation  can  be  analyzed  by 
applying  analysis  of  variance  (ANOVA)  model  to  the  acquired  data  set.  ANOVA  is  a  popular 
statistical  model  for  partitioning  the  total  variance  of  the  measured  quantity  in  an  experiment  into 
various  identifiable  factors  (or  sources  of  variation),  and  has  been  applied  for  analyzing  several 
spectroscopic  imaging  data:  chemical  compounds,  collagen  types,  skin  lesions,  and  plant  species. 
However,  to  our  knowledge,  ANOVA  has  not  been  applied  to  spectroscopic  imaging  of  tissue. 
Here,  we  present  appropriate  ANOVA  models  for  different  experimental  designs  of  IR  imaging 
data  from  TMAs,  evaluate  the  statistical  significance  of  the  sources  of  variance,  estimate 
variance  contributions  of  the  identified  sources,  and  quantify  the  relative  contributions  of  the 
sources  to  the  total  variation  in  the  data.  Finally,  after  examining  the  effect  of  the  sources  of 
variance,  we  also  find  the  most  discriminative  spectral  features  and  address  the  aspects  of  FT-IR 
imaging  and  TMA  techniques  that  can  be  improved  for  better  diagnostic  protocols  in  prostate 
cancer. 

Four  experimental  TMAs,  containing  prostate  tissue  samples,  were  obtained  from  different 
sources  (Tissue  microarray  research  program  at  the  National  Institutes  of  Health  and  Clinomics 
Inc.).  The  four  TMAs  contain  respectively  (i)  86  samples  from  16  patients,  (ii)  123  samples  from 
40  patients,  (iii)  121  samples  from  80  patients,  and  (iv)  240  samples  from  180  patients.  FT-IR- 
TMAs  were  taken  at  a  spatial  pixel  size  of  6.25  pm  and  a  spectral  resolution  of  4  cm'1.  The 
spectral  profile  of  a  pixel  spans  a  spectral  range  of  4,000-720  cm’1.  FT-IR  data  is  converted  into 


93 -dimensional  data  where  each  dimension  corresponds  to  a  spectral  feature,  which  can  be  peak 
ratios,  peak  areas  or  peak  centers  of  gravity.  We  note  that  the  unit  of  observation  in  a  spectral 
analysis  is  a  pixel,  but,  in  designing  TMAs,  the  unit  of  interest  is  a  tissue  sample  (called  “core”) 
or  a  patient.  The  number  of  pixels,  especially  of  a  single  histological  type  such  as  epithelium, 
often  varies  substantially  across  cores  and  resulting  data  imbalance  may  greatly  affect  the  results 
of  the  analysis.  Therefore,  we  do  not  employ  the  entire  collection  of  pixels  (or  cores)  in  TMAs, 
but  address  the  issue  of  data  imbalance  by  taking  sub-samples  of  cores  and  sub-samples  of  pixels 
within  each  core  in  an  attempt  to  balance  the  data  for  each  group.  The  pixels  corresponding  to 
histologic  classes  were  provided  by  either  an  automated  histologic  recognition  method  or  a 
pathologic  review. 


Between-histologic  class  ANOVA  model.  In  a  typical  TMA  setting,  many  cores  are  placed  in 
an  array,  one  or  more  cores  are  obtained  from  a  patient,  and  cores  are  often  composed  of 
multiple  histologic  classes  such  as  epithelium,  stroma,  muscle,  blood,  and  nerves,  i.e.,  patients 
nested  in  an  array,  cores  nested  in  a  patient,  and  histologic  classes  nested  in  a  core.  Accordingly, 
variability  in  FT-IR-TMAs  data  is  also  distributed  in  a  hierarchical  fashion.  Identifying  five 
potential  sources  of  variation  (array,  patient,  core,  histologic  class  and  residual  error),  we  present 
the  following  ANOVA  model  (“ between-histologic  class  model”)  for  this  TMA  design, 

yw* =M+ai+ P^) + r,(;(0)  +  + Psj(i)i + y5kd(ij)i + + ew»  ( 1 ) 

where  y  represent  IR  absorption  of  a  pixel  (  w  =  1 ,...,«)  in  a  spectral  feature  of  interest,  //  is  the 
overall  mean,  and  a,  ft,  y,  and  S  denote  array  (z  =1  ,...,na  ),  patient  (  /'  =  \,...,np ),  core  (k  =  1 
),  and  histologic  class  (/  =  \,...,ns)  effect,  respectively.  aS,  fid,  and  yd  are  called  interaction 
effects  whereas  a,  ft,  y,  and  d  are  designated  as  main  effects.  coijkIw  and  sijklw  represent 

measurement  error  and  residual  error  effects,  respectively.  On  the  contrary  to  the  hierarchical 
structure  of  array,  patient,  and  core  effects,  histologic  class  effect  is  crossed  with  each  of  array, 
patient,  and  core  effects.  Hence,  this  design  is  called  a  partly  nested  ANOVA.  Since  both  fixed 
(histologic  class)  and  random  (array,  patient,  and  core)  factors  present,  it  is  also  called  a  mixed 
effects  ANOVA  model  (see  Supporting  Information  for  details). 

The  effect  of  the  factors  and  their  true  variances  can  be  estimated  by  computing  ANOVA  table 
and  applying  expected  mean  squared  method  which  equates  the  observed  and  expected  mean 
squares18  (see  Supporting  Information  for  details).  The  total  variance  for  (1)  model  can  be 


written  as  cr2taZ  =  a2  +  a\a)  +  a2y[/j[a))  +  a,2  +  cr2,-  +  a%  +  a2/(p{ap  +  cr2  +  cx2 


where  cr2 


cr 


yip^\  ■>  and  cr“  respectively  indicate  variance  components  of  array,  patient,  core,  and  histologic 
class  effects,  and  cr2  and  cr2  are  variance  components  of  measurement  error  and  residual  error, 


respectively,  a2,  ex2  ,  a2 


and  cr2  can  be  attributable  to  biological  variation  as  well  as 


/?(«)’  KA«)T 

technical  variation.  These  are  due  to  biological  variation  because  samples  possess  different 
biological  characteristics.  There  is  also  technical  variation  in  that  variation  can  arise  from  any 
step  in  TMA  preparation,  cr2  belongs  to  technical  variation  and  is  separately  estimated  on  the 

assumption  that  it  follows  an  independent  and  identically  distributed  Gaussian  distribution  over 
the  entire  spectral  regions.  We  first  compute  the  noise  variance  over  the  non-absorbing  IR 
spectral  regions  (1900-2 100cm'1)  and  estimate  measurement  error  for  each  spectral  feature  cr2  is 


complex  and  reflects  the  combined  effects  of  biological  variation  (pixel-to-pixel  variation), 
technical  variation  (processing  error),  and  other  unexplained  experimental  variations.  Hence, 
thorough  inspection  of  residual  error  may  be  necessary  for  a  precise  and  incisive  analysis. 

TMAs  are  often  obtained  from  different  sources,  and  the  effect  of  the  factors  could  differ 
significantly  across  TMAs.  In  order  to  further  examine  the  differences,  we  estimate  the  variance 
components  for  each  TMA  by  restricting  the  (1)  ANOVA  model  to  a  single  TMA.  That  is,  we 
fitted  IR  data  of  each  TMA  to  the  following  model, 

Vjkiw  =  /i + P,  +  8l  +  ySfl  +  ySk{]y  +  cojklw  +  sjklw .  (2) 

2  222  22  22 

Similarly,  the  total  variance  is  <Jtotal  =as-\-a/3  +  <j~(^+<jps  +  <j~(^s+am  +  crE .  This  model  is 
also  a  partly  nested  ANOVA. 

Between-array  ANOVA  model.  Different  histologic  classes  possess  dissimilar  chemical 
properties  and  cellular  functions.  It  may  introduce  substantial  variation  to  FT-IR-TMAs  data. 
Eliminating  histologic  class  factor  and  other  related  factors  from  the  (1)  model,  we  further 
examine  the  effect  of  histologic  class  on  the  data.  The  ANOVA  model  (“ between-array  model”) 
can  be  expressed  as 

yijkw  =  a + a, + PAi) + r*(;(0) + + £ijkw  •  (3) 

Since  all  factors  are  random,  it  is  a  nested  random  effects  model.  The  total  variance  of  data  can 
be  stated  as  crfolal  =  <J2a  +  <x^a)  +<y2,(lg(a))  +cr®  +  as  •  Since  heterogeneous  histologic  classes  are 

merged  into  a  core,  biological  variation  may  increase  in  the  model. 

Between-subcellular  component  ANOVA  model.  The  histologic  classes  are  composed  of  a 
number  of  subcellular  components  (membrane,  cytoplasm,  nucleus,  cytoskeleton,  etc.).  By 
further  separating  a  histologic  class  into  subcellular  components,  we  can  examine  the  effect  of 
subcellular  components  for  the  histologic  classes.  The  model  is  identical  to  the  above  (1) 
ANOVA  model.  The  only  difference  is  that  the  histologic  class  effect  is  replaced  with 
subcellular  component  effect.  Here,  we  restrict  the  model  (“ between-subcellular  component 
model”)  to  a  single  array  as  follows: 

Tw  =M  +  yk+  <pm  +  mm  +  (4) 

where  cp  represents  sub-cellular  component  (m  =  \,...,nm)  effect,  which  is  fixed.  This  is  a  two 
factor  crossed  and  mixed  effects  model,  and  the  total  variance  is  expressed  as 
(Jtotai=<Jr+<Jl+<Jr<p+ao>+<J£ •  in  between-array  model,  eliminating  subcellular 

component  factor  and  interaction  effect  between  subcellular  component  and  core  from  the  (4) 
model,  the  following  ANOVA  model  (“ within-epithelium  model”)  is  constructed, 

ykw=M  +  Yk+0}kw  +  £kw.  (5) 

This  is  a  random  effects  model,  and  the  total  variance  is  expressed  as  crfotal  =  cr^  +  cr2m  +  cr . 

Variance  component  analysis  identifies  discriminative  features  for  histologic  analysis.  Three 
TMAs  (i,  ii,  iii)  were  used  in  this  experiment.  From  each  TMA,  26  sample  cores  from  13  patients 
were  selected,  and  200  pixels  were  chosen  from  each  histologic  class  in  a  core.  The  histologic 
segmentation  was  conducted  by  a  Bayesian  classifier19,  built  on  18  spectral  features  and  achieved 
>0.99  AUC  on  cell  type  classification.  Although  several  histologic  classes  present,  we  only 


consider  epithelium  and  stroma;  data  imbalance  in  other  classes  is  severer.  Using  between- 
histologic  class  model,  ANOVA  table  (Table  1)  and  the  portions  of  total  variance  due  to  the 
associated  factors  (Fig.  1A)  were  computed.  21  out  of  93  features  were  dominated  by  histologic 
class  effect,  and  either  array  effect  or  residual  error  introduced  the  most  variation  into  the  data 
other  than  the  21  features.  The  high  variability  in  histologic  class  effect  indicates  that  epithelium 
and  stroma  greatly  differ  in  their  IR  absorption  values,  i.e.,  dissimilar  chemical  properties.  Thus, 
the  21  features  are  capable  of  histologic  analysis,  and  for  the  purpose  of  histologic 
discrimination,  these  features  could  serve  as  good  candidates.  In  fact,  5  of  them  were  included  in 
the  Bayesian  cell  type  classifier.  This  may  be  attributable  to  the  difference  between  the  datasets 
or  redundancy  in  the  features.  The  Bayesian  classifier  was  optimized  for  the  classification,  but 
variance  components  only  show  the  ability  of  a  single  spectral  feature,  not  their  combined 
effects.  It  is  probable  that  due  to  the  redundancy  the  rest  of  the  21  features  were  not  selected  by 
the  Bayesian  classifier.  Moreover,  patient  factor  has  very  little  effect  on  the  total  variation  of  the 
data,  indicating  that  inferences  made  or  models  built  on  the  data  would  be  applicable  to  the  entire 
patient  population  without  or  with  very  few  restrictions  or  complication.  Although  its 
contribution  to  the  total  variance  is  small,  larger  variance  from  core  effect  than  from  patient 
effect  suggests  that  the  selection  of  cores  is  more  important  than  that  of  patients  in  constructing 
TMAs.  Since  the  size  of  a  core  is  relatively  small  compared  to  the  entire  tissue  or  organ,  it  is 
likely  that  some  of  the  selected  cores  are  not  representative  of  the  tissue  or  organ.  We  also  note 
that  the  small  number  of  core  samples  could  affect  the  variance  estimates.  We  also  note  that 
interaction  effects,  by  and  large,  were  negligible  except  the  interaction  between  core  effect  and 
histologic  class  effect.  Interestingly,  there  were  19  features  which  were  dominated  by  array 
effect,  and  these  may  need  further  assessment  and  reveal  array-specific  characteristics. 
Furthermore,  we  assessed  the  statistical  significance  of  histologic  class  effect  by  computing  F- 
test  statistics,  which  is  the  ratio  of  the  mean  square  of  histologic  class  effect  and  the  means 
square  of  the  interaction  effect  of  histologic  class  and  array  (see  Supporting  Infonnation  for 
details).  Computing  p- values  for  histologic-class  effect,  the  lower  p-values,  the  larger  portions  of 
explained  variance  were  observed  in  general  (Fig.  7A);  the  rank  correlation  coefficient  of  -0.57 
(p-value~0.0)  was  obtained  between  the  portions  of  explained  variance  and  the  p-values.  Thus, 
both  variance  components  and  F-test  confirm  the  discriminative  ability  of  the  features  for 
histologic  analysis  of  tissue  samples. 
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Figure  15.  Portions  of  explained  variance  with  and  without  histologic  class  factor.  The 
portions  of  total  variance  explained  by  the  associated  factors  are  estimated  for  (a)  between- 
histologic  class  model  and  (b)  between-array  model  and  plotted  over  93  spectral  features,  (a) 
p-values  of  histologic  class  effect  are  shown  at  the  bottom.  (a)(b)  The  spectral  features  are 
ordered  by  the  portion  of  total  variance  due  to  (a)  histologic  class  effect.  Interaction  effects 
are  not  shown  for  (a). 


Table  3.  An  example  of  ANOVA  table  for  between-histologic  class  model. 


Source 

df 

MS 

EMS 

Array 

2 

0.0167 

8 

cr2  +  cr2  +  400crf  ,  ..  +  800cr2  ,  +  10400<r2 

co  s  r\P\a))  P\a)  a 

Patient(  Array) 

36 

0.0246 

2 

a2  -ha2  +400 a2(R(  ^  +  800<tL  x 

&  £  r\P\a))  P\a) 

Core(Patient(  Array)) 

39 

0.0134 

5 

+  +++400+(/Xa)) 

Histologic  class 

1 

46.990 

7 

A  +  cr;  +  200ama))s  +  400a2s  +  5200ctA  +  15600cr2 

Array*Histologic  class 

2 

0.0118 

0 

cr2  +  cr2  +  200 sU  +  400cr2  +  5200cr2 

co  £  y{ff{a))d  pd  ao 

Patient(Array)*Histologic  class 

36 

0.0160 

4 

cr2  +  cr2  +  200crf v.s  +  400cr2 

co  £  y(/?(a))J  po 

Core(Patient(  Array  ))*Histologi 
c  class 

39 

0.0135 

6 

cr2  +cr2  +  200 a2,n,  y,  „ 

co  £  y{p{a))d 

Residual  error 

3104 

4 

3.443e- 

4 

2 

A 

Measurement  error 

4.601e- 

6 

2 

A 

F-test  statistics  (Histologic 
class) 

3980.5 

p-value  (Histologic  class) 

0.000251 

df,  MS  and  EMS  denote  degrees  of  freedom,  mean  squares,  and  expected  mean  squares, 
respectively.  *  indicates  the  interaction  effect  between  factors. 


Variance  component  analysis  reveals  weak  discriminative  feature  for  subcellular 
components  analysis.  To  examine  the  effect  of  subcellular  components,  in  the  (iv)  TMA, 
epithelial  cells  were  further  divided  into  two  subcellular  components:  cytoplasm-rich  and 
nucleus-rich.  The  cytoplasm-rich  and  nucleus-rich  pixels  were  selected  by  a  pathologic  review. 
40  cores  from  40  patients  were  chosen,  and  100  pixels  for  each  of  the  two  components  were 
extracted  to  build  between-subcellular  component  model.  As  shown  in  Fig.  2A,  subcellular 
component  effect  is  the  dominant  source  of  variation  for  only  9  features,  and  residual  error  is  the 
most  dominant  factor  for  the  rest  of  the  features.  Although  subcellular  component  effect  is  the 
primary  source  of  variation,  the  variance  estimate  of  subcellular  component  effect  does  not 
overwhelm  that  of  other  effects  as  opposed  to  the  huge  differences  between  histologic  effect  and 
other  effects  in  between-histologic  class  model.  Thus,  with  the  selected  cytoplasm-rich  and 
nucleus-rich  pixels,  we  do  not  expect  to  observe  a  notable  difference  in  IR  spectra.  This  result  is 
consistent  with  the  previous  work  where  -0.72  AUC  was  obtained  in  classifying  pixels  into 
cytoplasm-rich  and  nucleus-rich  pixels.  Residual  error  is  attributable  to  the  similarity  in  the 
underlying  chemical  components,  errors  in  selecting  the  cytoplasmic  and  nucleus  pixels,  and 
limitation  in  FT-IR  imaging.  That  is,  both  biological  and  technical  variations  contribute  to 
residual  error.  Biological  variations  could  be  reduced  by  minimizing  cytoplasm  and  nucleus 
segmentation  errors  or  obtaining  higher-resolution  FT-IR  imaging.  Performing  repeated 
measurement  of  FT-IR  imaging  on  the  same  TMAs  could  alleviate  the  contribution  of  technical 
variations.  In  addition,  we  computed  F-test  statistic  for  subcellular  component  effect  as  the  ratio 
of  the  mean  square  of  subcellular  component  effect  and  the  means  square  of  the  interaction  effect 
of  subcellular  component  and  core.  Analogous  to  the  results  from  between-histologic  class 
model,  the  larger  explained  variance  due  to  subcellular  component  effect,  the  lower  p- values  we 
observed,  and  the  rank  correlation  coefficient  of  -0.33  (p-value>0.1)  was  obtained  between  the 
portions  of  explained  variance  and  the  p- values;  however,  the  p-values  were  often  too  small  (-0), 
misleading  about  the  significance  of  subcellular  component  effect.  Accordingly,  F-test  could  not 
effectively  provide  discriminative  features  whereas  variance  components  suggest  weakly 
discriminating  features.  We  also  note  that  the  features,  owning  high  variation  from  histologic 
class  effect  in  between-histologic  class  model,  were  not  dominated  by  subcellular  component 
effect  in  general.  This  indicates  that  the  chemical  properties  to  distinguish  histologic  classes 
differ  from  the  properties  to  differentiate  subcellular  components. 
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Figure  16.  Portions  of  explained  variance  with  and  without  subcellular  component  factor. 
The  portions  of  total  variance  explained  by  the  associated  factors  are  estimated  for  (a) 
between-subcellular  component  model  (b)  within-epithelium  model  and  plotted  over  93 
spectral  features,  (a)  p-values  of  subcellular  component  effect  are  shown  at  the  bottom,  and 
the  blue  boxes  indicate  that  the  corresponding  features  are  histologic  class -dominant.  (a)(b) 
The  spectral  features  are  ordered  by  the  portion  of  total  variance  due  to  (a)  subcellular 
component  effect. 

Biological  variation  is  the  main  source  of  variation  in  residual  error  within  a  core  and 
epithelium.  Making  no  differentiation  between  histologic  classes  in  a  core,  we  fitted  the  FT-IR- 
TMAs  data,  used  to  build  between-histologic  class  model,  into  between-array  model,  and  the 
portions  of  total  variance  explained  by  each  factor  were  computed.  As  shown  in  Fig.  7B,  residual 


error  is,  in  general,  the  dominant  source  of  variation  over  the  93  features.  The  effects  of  four 
other  factors  (array,  patient,  core,  and  measurement  error)  were  relatively  small,  and  either  array 
effect  or  core  effect  was  mostly  the  second  dominant  source  of  variation.  16  features  were 
dominated  by  array  effect,  of  which  11  features  were  also  array  effect-dominant  features  in 
between-histologic  class  model.  In  comparison  with  between-histologic  class  model,  combining 
histologic  classes,  we  observed  that  residual  error  substantially  increased  in  many  features, 
especially  for  the  21  histologic  class-dominant  features.  Similarly,  we  constructed  the  within- 
epithelium  model  using  the  data  fitted  into  between-subcellular  component  model.  Estimating  the 
portions  of  variance  due  to  core,  measurement  error,  and  residual  error  effects,  residual  error 
dominated  over  the  other  two  effects  in  the  entire  93  features  (Fig.  8B).  Compared  to  the 
variance  components  from  between-subcellular  component  model,  we  again  observed  significant 
increase  in  residual  error  for  numerous  features  including  the  9  subcellular  component-dominant 
features.  Both  histologic  class  and  subcellular  component  factors  group  similar  pixels  in 
chemistry  into  the  same  group,  as  a  result,  decreasing  biological  variation.  This  leads  us  to 
conclude  that  biological  variation  is  the  main  source  of  variation  in  residual  error,  especially  for 
those  21  histologic  class-dominant  and  9  subcellular  component-dominant  features  in  the  data. 
However,  note  that  the  interpretation  of  histologic  class  effect  and  subcellular  component  effect 
should  be  limited  to  the  population  under  the  experiment  since  both  effects  are  fixed. 

Differences  in  the  effect  of  the  associated  factors  are  observed  across  TMAs.  In  order  to 
investigate  the  differences  in  variance  estimates  across  TMAs,  each  FT-IR-TMA  data  is  fitted  to 
the  (2)  ANOVA  model.  The  proportions  of  variance  estimates  were,  in  general,  very  similar 
across  TMAs,  and,  comparing  to  between-histologic  class  model,  the  similar  trends  were 
observed  for  the  main  effects;  16  out  of  the  21  histologic  class -dominant  features  ( between- 
histologic  class  model )  showed  high  variability  due  to  histologic  class  effect  across  all  three 
TMAs;  the  rest  of  the  features  were  mostly  dominated  by  residual  error  across  TMAs. 
Examining  the  19  array-dominant  features  from  between-histologic  class  model,  we  observed  the 
differences  in  the  variance  components  of  not  only  histologic  class  effect  but  also  other  main  and 
interaction  effects  across  TMAs.  In  Fig.  9,  for  the  first  four  features,  although  residual  error  was 
the  most  dominant  source  of  variation,  the  relative  orders  of  other  factors  varied  greatly  across 
TMAs,  for  example,  histologic  class  effect  and  patient  effect  in  the  (i)  TMA  differ  from  the  other 
2  TMAs;  the  next  four  features  showed  unusually  high  variability  in  the  (i)  TMA  and  moderate 
dominance  in  the  (iii)  TMA  from  histologic  class  effect,  but,  in  the  (ii)  TMA,  the  effect  was  not 
dominant  or  its  contribution  is  close  to  residual  error;  examining  the  last  1 1  features,  the 
differences  in  the  portions  of  variance  due  to  both  main  and  interaction  effects  were  also 
observed.  For  histologic  analysis,  these  19  array-dominant  features  may  be  avoided.  The  four 
features,  in  particular,  introducing  high  variation  from  histologic  class  effect  in  the  (i)  TMA 
could  be  specific  to  the  population  represented  by  the  (i)  TMA,  and  thus  may  distract  the 
histologic  analysis  and  its  translation  into  clinical  practice.  Computing  p- values  of  histologic 
class  effect,  as  observed  in  between-histologic  class  model,  features  with  higher  variance 
components  possess  lower  p-values,  but  weaker  correlations  between  them  (rank  correlation 
coefficients  of  -0.36  — 0.43)  were  observed.  We  note  that  the  computation  of  F-test  statistic  is 
not  identical  to  between-histologic  class  model.  Here,  the  denominator  is  the  mean  square  of  the 
interaction  effect  between  histologic  class  and  patient. 
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Figure  17.  Portions  of  explained  variance  for  array-dominant  features  across  TMAs. 
Portions  of  variance  are  shown  for  (a)  between-array  model  and  (b,c,d)  between-histologic 
class  model  restricted  to  each  of  (i,ii,iii)  TMAs.  Spectral  features  are  ordered  by  the  portion 
of  total  variance  due  to  (a)  array  effect. 


Correlating  Changes  Between  two  Data  Sets 

There  is  an  underlying  assumption  on  most  model  building  processes:  given  a  learned  classifier, 
it  should  be  usable  to  explain  unseen  data  from  the  same  given  problem.  Despite  this  seemingly 
reasonable  assumption,  when  dealing  with  biological  data  it  tends  to  fail;  where  classifiers  built 
out  of  data  generated  using  the  same  protocols  in  two  different  laboratories  can  lead  to  two 
different,  non-interchangeable,  classifiers.  There  are  usually  too  many  uncontrollable  variables  in 
the  process  of  generating  data  in  the  lab  and  biological  variations,  and  small  differences  can  lead 
to  very  different  data  distributions,  with  a  fracture  between  data.  This  paper  presents  a  genetics- 
based  machine  learning  approach  that  performs  feature  extraction  on  data  from  a  lab  to  help 
increase  the  classification  performance  of  an  existing  classifier  that  was  built  using  the  data  from 
a  different  laboratory  which  uses  the  same  protocols,  while  learning  about  the  shape  of  the 
fractures  between  data  that  motivated  the  bad  behavior.  This  is  a  critical  step  in  understanding 
differences  between  our  different  prostate  cancer  data  sets  here. 


The  specific  problem  this  study  attempts  to  solve  is  the  following:  we  have  data  from  one 
laboratory  (dataset  A),  and  derive  a  classifier  from  it  that  can  predict  its  category  accurately.  We 
are  then  presented  with  data  from  a  second  laboratory  (dataset  B).  This  second  dataset  is  not 
accurately  predicted  by  the  classifier  we  had  previously  built  due  to  a  fracture  between  the  data 
of  both  laboratories.  We  intend  to  find  a  transformation  of  dataset  B  (dataset  S)  where  the 
classifier  works.  Evolutionary  computing,  as  introduced  by  Holland,  is  based  on  the  idea  of  the 
survival  of  the  fittest,  evoked  by  the  natural  evolutionary  process.  In  genetic  algorithms  (GAs), 
solutions  (genes)  are  more  likely  to  reproduce  the  fitter  they  are,  and  random  sporadic  mutations 
help  maintain  population  diversity.  Genetic  Programming  (GP)  is  a  development  of  those 
techniques,  and  follows  a  similar  pattern  to  evolve  tree-shaped  solutions  using  variable-length 


chromosomes.  Feature  extraction  ‘consists  of  the  extraction  a  set  of  new  features  from  the 
original  features  through  some  functional  mapping’.  Our  approach  to  the  problem  can  be  seen  as 
feature  extraction,  since  we  build  a  new  set  of  features  which  are  functions  of  the  old  ones. 
However,  we  have  a  different  goal  than  that  of  classical  feature  extraction,  since  our  intention  is 
to  fit  a  dataset  to  an  already  existing  classifier,  not  to  improve  the  performance  of  a  future  one.  In 
this  work,  hence,  we  intend  to  demonstrate  the  use  of  GP -based  feature  extraction  to  unveil 
transformations  in  order  to  improve  the  accuracy  of  a  previously  built  classifier,  by  perfonning 
feature  extraction  on  a  dataset  where  said  classifier  should,  in  principle,  work;  but  where  it  does 
not  perform  accurately  enough.  We  tested  our  algorithm  first  on  artificially-built  problems 
(where  we  apply  ad  hoc  transformations  to  datasets  from  which  a  classifier  has  been  built,  and 
use  the  dataset  resulting  from  those  transformations  as  our  problem  dataset);  and  then  on  a  real- 
world  application  where  TMA  data  from  two  different  medical  laboratories  regarding  prostate 
cancer  diagnosis  are  used  as  datasets  A  and  B.  Even  though  the  method  proposed  does  not 
attempt  to  reduce  the  number  of  features  or  instances  in  the  dataset,  it  can  still  be  regarded  as  a 
form  of  data  reduction  because  it  unifies  the  data  distributions  of  two  datasets;  which  results  in 
the  capability  of  applying  the  same  classifier  to  both  of  them,  instead  of  needing  two  different 
classifiers,  one  for  each  dataset. 

In  our  previous  work,  we  successfully  applied  a  genetics-based  approach  to  the  development  of  a 
classifier  that  obtained  human-competitive  results  based  on  FTIR  data.  However,  the  classifier 
built  from  the  data  obtained  from  one  laboratory  proved  remarkably  inaccurate  when  applied  to 
classify  data  from  a  different  hospital.  Since  all  the  experimental  procedure  was  identical;  using 
the  same  machine,  measuring  and  post-processing;  and  having  the  exact  same  lab  protocols,  both 
for  tissue  extraction  and  staining;  there  was  no  factor  that  could  explain  this  discrepancy.  While 
one  track  was  to  understand  the  sources  of  variance,  here  we  examined  whether  we  could  bridge 
the  differences  using  GAs.  The  experimental  and  mathematical  details  are  presented  in  the 
attached  manuscript  “Repairing  fractures  between  data  using  genetic  programming-based 
feature  extraction:  A  case  study  in  cancer  diagnosis 

We  summarize  below  the  results  for  the  prostate  cancer  problem  in  terms  of  classifier  accuracy. 
The  results  obtained  can  be  seen  in  Table  2.  In  that  table,  dataset  A  is  the  one  from  the  first  lab; 
which  was  used  to  build  the  classifier,  dataset  B  is  the  one  coming  from  the  second  lab,  and 
dataset  S  is  the  result  of  the  application  of  GP-RFD.  To  check  whether  the  full  dataset  B  was 
needed  to  evolve  an  effective  transformation,  we  also  tested  using  just  half  of  it  to  train  GP-RFD, 
and  the  other  half  to  test  (2-fold  cross  validation).  These  results  are  also  included  in  Table  9.  The 
performance  results  are  excellent  for  a  number  of  reasons.  First  and  foremost,  GP-RFD  was  able 
to  find  a  transformation  over  the  data  from  the  second  laboratory  that  made  the  classifier  work 
just  as  well  as  it  did  on  the  data  from  the  first  lab,  effectively  finding  the  hidden  perturbations 
that  prevented  the  classifier  from  working  accurately.  The  second  positive  conclusion  to  be 
obtained  from  the  results  is  the  generalization  power  of  GP-RFD.  As  can  be  observed  from  the 
test  results,  GP-RFD  does  not  ‘cheat’  by  over-learning  on  the  known  data,  and  works  well  when 
transfonning  new,  previously  unseen,  samples.  Third,  the  results  show  GP-RFD  was  capable  of 
obtaining  excellent  results  using  just  half  of  the  B  dataset  to  train.  This  result  highlights  the 
power  of  the  method  to  unveil  the  hidden  transformation  from  a  relatively  low  number  of 
samples.  We  also  perfonned  a  Wilcoxon  signed-ranks  test  to  evaluate  the  performance  of  GP- 
RFD  over  the  case  of  study  problem.  In  order  to  do  it,  we  used  the  results  from  each  partition  in 


the  5-fold  cross  validation  procedure.  We  ran  the  experiment  four  times,  resulting  in  4  5  =  20 

performance  samples  to  carry  out  the  statistical  test.  As  we  did  before,  R+  corresponds  to  the 
first  algorithm  in  the  comparison  winning,  and  R  to  the  second  one.  Table  10  shows  the  results. 
The  results  on  the  case  study  problem  are  exactly  the  same  as  those  achieved  in  the  benchmark 
problems.  We  can  then  conclude  GP-RFD  was  capable  of  repairing  the  existing  fracture  between 
the  data  from  both  laboratories.  Again,  this  conclusion  assumes  class  distribution  did  not  change. 
It  is  a  given  in  this  case,  since  we  know  the  class  distribution  to  be  equal  in  datasets  A  and  B,  but 
is  an  issue  that  has  to  be  kept  in  mind  when  applying  the  method  to  other  problems. 


Table  4.  Classifier  performance  results 


Validation  method 

Classifier  performance  in  dataset  . . 

A-Lraming 

A-tesl 

B 

S- training 

S-test 

5-fold  aoss  validation 

0.95435 

032015 

0.83570 

095191 

032866 

2-fold  cross  validation 

0.95435 

032015 

033570 

095482 

033223 

Table  5.  Statistical  testing  of  the  new  protocol. 

Comparison 

r 

R~ 

p-Vaiue 

Null  hypothesis  of  equality 

A-test  vs  B 

210 

0 

1.91E-0C7 

rejected  (A-test  outperforms  B) 

B  vs  5-test 

0 

210 

1.91E-007 

rejected  (S-test  outperforms  B) 

A-training  vs  S-training 

126 

84 

- 

accepted 

A-tcst  vs  S-test 

84 

126 

accepted 

We  have  presented  GP-RFD,  a  new  algorithm  that  approaches  a  common  problem  in  real  life  for 
which  not  many  solutions  have  been  proposed  in  evolutionary  computing.  The  problem  in 
question  is  the  repairing  of  fractures  between  data  by  adjusting  the  data  itself,  not  the  classifiers 
built  from  it.  We  have  developed  a  solution  to  the  problem  by  means  of  a  GP -based  algorithm 
that  performs  feature  extraction  on  the  problem  dataset  driven  by  the  accuracy  of  the  previously 
built  classifier.  We  have  tested  GP-RFD  on  a  set  of  artificial  benchmark  problems,  where  a 
problem  dataset  is  fabricated  by  applying  an  ad-hoc  disruption  to  an  original  dataset,  and  it  has 
proved  capable  of  solving  all  the  transformations  presented  showing  good  performance  both  in 
train  and,  more  importantly,  test  data.  We  have  also  being  able  to  apply  GP-RFD  to  the  problem 
of  prostate  tissue  classification,  where  data  from  two  different  laboratories  regarding  prostate 
cancer  diagnosis  was  provided,  and  where  the  classifier  learned  from  one  did  not  perform  well 
enough  on  the  other.  Our  algorithm  was  capable  of  learning  a  transformation  over  the  second 
dataset  that  made  the  classifier  fit  just  as  well  as  it  did  on  the  first  one.  The  validation  results 
with  5 -fold  cross  validation  also  support  the  idea  that  the  algorithm  is  obtaining  good  results;  and 
has  a  strong  generalization  power.  Lastly,  we  have  applied  a  statistical  analysis  methodology  that 
supports  the  claim  that  the  classifier  performance  obtained  on  the  solution  dataset  significantly 
outperforms  the  one  obtained  on  the  problem  dataset.  There  is,  however,  one  point  where  the 
proposed  method  has  not  been  successful.  The  learned  transformations  have  failed 
to  provide  any  information  about  why  the  fracture  appeared  between  the  data  from  the  two 
laboratories. 


Task  4.  Write  reports  and  finalize  algorithms  into  software  (Months  33-36) 


A  number  of  reports  (invention  disclosure,  conference  etc.)  have  been  written  and  manuscripts 
based  on  this  work  have  been  submitted  and  have  been  printed  as  detailed  in  the  following 
sections. 

In  summary,  the  promised  work  has  been  accomplished  to  a  reasonable  degree  and  has  opened 
up  significant  doors  to  future  progress  in  prostate  pathology  as  a  research  direction  as  well  as  for 
patients  and  clinicians. 


Key  Research  Accomplishments 

•  A  genetic  algorithm  based  method  to  distinguish  benign  from  malignant  epithelium  using 
infrared  spectroscopic  imaging  data  was  shown  to  be  effective.  Large  scale  validation  shows 
promising  results  and  a  manuscript  is  being  written. 

•  We  determined  that  one  of  the  key  factors  in  understanding  our  data  was  the  spatial  structure 
of  the  tissue,  that  closely  affected  the  IR  data.  A  series  of  simulations  were  conducted  after 
developing  a  rigorous  optical  model  to  predict  distortions.  Results  are  reported  in  two 
manuscripts  in  Anal.  Chem. 

•  A  combination  of  IR  and  conventional  pathology  imaging  has  been  developed  and  extensively 
validated. 

•  A  method  to  correlate  Gleason  grades  with  measured  data  has  been  developed.  Larger 
validation  studies  are  needed. 

•  A  number  of  patent  applications  and  invention  disclosures  as  well  as  peer-reviewed 
publications  have  resulted  from  these  activities. 


Reportable  Outcomes . 

Manuscripts 
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11.  B.  Kwon,  C.  Wang,  K.  Park,  R.  Bhargava,  W.P.  King  “Themomechanical  Sensitivity  of 
Microcantilevers  in  the  Mid-infrared  Spectral  Region”,  Nano  Micro  Thermophys  Eng,  15,  16-28  (201 1) 

12.  J.  G.  Moreno-Torres,  X.  Llora,  D.E.  Goldberg,  R.  Bhargava  “On  the  homogenization  of  data  from  two 
laboratories  using  genetic  programming”  Lee.  Notes  Comp.  Sci.,  6471/2010,  185-197  (2010),  DOI: 
1 0. 1007/978-3-642-1 7508-4  12. 

13.  A.K.  Kodali,  M.V.  Schulmerich,  R.  Palekar,  X.  Llora,  R.  Bhargava  “Optimized  Nanospherical  Layered 
Alternating  Metal-dielectric  Probes  for  Optical  Sensing”  Opt.  Exp.,  18,  23302-23313  (2010) 

14.  R.K.  Reddy,  R.  Bhargava  ‘  ‘Automated  noise  reduction  for  accurate  classification  of  tissue  from  low 
signal-to-noise  ratio  imaging  data”  Analyst,  135,  2818-2825  (2010)  DOI:  10.1039/C0AN00350F 

15.  A.K.  Kodali,  X.  Llora,  R.  Bhargava  “Optimally  tailored  Raman  spectroscopic  probes  for  ultrasensitive 
and  highly  multiplexed  assays”  Proc.  Natl.  Acad.  Sci.,  107,  13620-13625  (2010)  DOI: 
10.1073/pnas.  1003926 107 

16.  A.K.  Kodali,  M.V.  Schulmerich,  J.  Ip,  G.  Yen,  B.T.  Cunningham,  R.  Bhargava  “Narrowband  Mid- 
infrared  reflectance  filters  using  guided  mode  resonance”  Anal.  Chem.,  82,  5697-5706  (2010)  DOI: 
10. 1021/acl007 128 

1 7.  M.V.  Schulmerich,  A.K.  Kodali,  R.K.  Reddy,  L J.  Elgass,  R.  Bhargava  ‘  ‘Dark  field  Raman  Microscopy” 
Anal.  Chem.,  82,  6273-6280  (2010)  DOI:  1 0. 102 1/acl 014 194 

18.  R.  Kong,  R.K.  Reddy,  R.  Bhargava  “Characterization  of  Tumor  Progression  in  Engineered  Tissue  using 
Infrared  Spectroscopic  Imaging”  Analyst  135,  1569-1578  (2010)  DOI:  10.1039/c0an001 12k 

19.  B.J.  Davis,  P.S.  Carney,  R.  Bhargava  “Theory  of  mid-infrared  absorption  microspectroscopy.  II. 
Heterogeneous  samples”  Anal.  Chem.,  82,  3487-3499  (2010)  DOI:  10.1021/ac902068e 

20.  B.J.  Davis,  P.S.  Carney,  R.  Bhargava  “Theory  of  mid-infrared  absoiption  microspectroscopy.  I. 
Homogeneous  samples”  Anal.  Chem.,  82,  3474-3486  (2010)  DOI:  10.1021/ac902067p 

21.  X.  Llora,  A.Priya,  R.  Bhargava  “Observer-Invariant  Histopathology  using  Genetics-Based  Machine 
Learning”  Nat.  Computing,  8,  101-120  (2009) 

Book  Chapters 

1.  M.J.  Walsh,  R.  Bhargava  “Infrared  spectroscopic  imaging:  an  integrative  approach  to 
pathology”  G.  Popescu,  ed.  “Nanobiophotonics”  McGraw-Hill  (2010) 

2.  R.K.  Reddy,  R.  Bhargava  “Chemometric  methods  for  biomedical  Raman  spectroscopy  and 
imaging”  M.D.  Morris,  P.Matousek,  eds.  “Emerging  Raman  Applications  and  Techniques  in 
Biomedical  and  Pharmaceutical  Fields”,  Springer- Verlag,  Berlin  Heidelberg  (2010) 

3.  A.K.  Kodali,  R.  Bhargava  "Nanostructured  Probes  to  Enhance  Optical  and  Vibrational 
Spectroscopic  Imaging  for  Biomedical  Applications",  Y.Y.  Fu  and  A.  Narlikar,  eds.  “The 
Oxford  handbook  of  Nanoscience  and  Technology:  Vol.  Ill”,  Oxford  University  Press,  Oxford, 
UK  (2010) 

Other  manuscripts 

1.  M.J.  Walsh,  M.J.  Nasse,  F.N.  Pounder,  V.  Macias,  A.  Kajdacsy-Balla,  C.  Hirschmugl,  R. 
Bhargava  “Synchrotron  FTIR  Imaging  For  The  Identification  Of  Cell  Types  Within  Human 
Tissues”  AIP  Conference  Proc.  Vol.  1214  WIRMS  2009  5th  international  workshop  on  infrared 
microscopy  and  spectroscopy  with  accelerator  based  sources  pp.  105-107 

2.  F.N.  Pounder,  R.K.  Reddy,  M.J.  Walsh,  R.  Bhargava  “Validating  the  cancer  diagnosis  potential 
of  mid-infrared  spectroscopic  imaging”  Proc.  SPIE  7186,  art.  no.  71860f 

3.  R.  Bhargava  ,  B.J.  Davis,  “Histologic  models  for  optical  tomography  and  spectroscopy  of 
tissues”  Proc.  SPIE  7174,  71 742H  (2009),  DOI:  10. 11 17/12.8 101 19 


Presentations 


Invited  conference  presentations 

First  author  is  the  presenting  author;  First  author  is  also  the  invited  author  unless  indicated  by  * 

1.  R.  Bhargava,  M.J.  Walsh,  R.K.  Reddy,  J.T.  Kwak,  A.  Balia  “Chemical  imaging  for  histopathology” 
New  Frontiers  and  Grand  Challenges  in  Laser-Based  Biological  Microscopy,  Telluride,  CO,  August 
2011 

2.  R.  Bhargava  “Infrared  spectroscopic  imaging  for  label-free  biomedical  informatics”  OSA  topical 
meeting  on  optical  sensors,  Toronto,  June  201 1 

3.  R.  Bhargava  “Infrared  spectroscopic  imaging:  a  new  direction  for  an  old  chemical  imaging  technique” 
Central  Regional  Meeting  of  the  ACS,  Indiana,  June  2011 

4.  R.  Bhargava  “Chemical  imaging  for  histopathology”,  Pittcon  2011,  Atlanta,  March  2011 

5.  R.  Bhargava  "Systems  pathology  with  infrared  spectroscopic  imaging"  Pacifichem  2010,  Honolulu,  HI, 
December  2010 

6.  R.  Bhargava,  M.V.  Schulmerich,  A.K.  Kodali,  R.K.  Reddy  and  M.J.  Walsh  “Chemical  imaging  for 
automated  histopathology”,  Eastern  Analytical  Symposium,  Somerset,  November  2010 

7.  R.  Bhargava,  M.V.  Schulmerich,  A.K.  Kodali,  X.  Llora,  R.K.  Reddy,  M.  Kole  “Using  computational 
modeling  to  improve  Biomedical  Raman  microscopy",  Eastern  Analytical  Symposium,  Somerset, 
November  2010 

8.  R.  Bhargava  “Chemical  imaging  for  molecular  pathology”  Cancer  Colloquia  VII,  St.  Andrews, 
Scotland,  November  2010 

9.  R.  Bhargava,  R.K.  Reddy,  J.  Ip,  F.N.  Pounder,  M.V.  Schulmerich,  D.  Mayerich,  X.  Llora,  R.  Kong,  M.J. 
Walsh  “Infrared  Spectroscopic  Imaging  for  Label-Free  and  Automated  Histopathology”  Frontiers  in 
Optics,  Rochester,  October  2010 

10.  R.  Bhargava,  A.  K.  Kodali,  M.  V.  Schulmerich,  X.  Llora  and  R.  K.  Reddy  “Integrating  physics  with 
chemometrics  for  enhanced  vibrational  spectroscopic  imaging”,  FACSS  10,  Raleigh,  October  2010 
[Meggers  award  symposium] 

1 1 .  R.  Bhargava,  M.  V.  Schulmerich  and  R.  K.  Reddy  “Discrete  frequency  infrared  spectroscopic  imaging 
with  a  quantum  cascade  laser  -  rationale  and  potential”,  FACSS  10,  Raleigh,  October  2010 

12.  R.  Bhargava,  P.S.  Camey,  R.K.  Reddy,  A.K.  Kodali  “Modeling  distortions  in  infrared  spectroscopic 
imaging”,  FACSS  10,  Rayleigh,  October  2010 

13.  R.  Bhargava  “Enabling  prostate  pathology  with  infrared  spectroscopic  imaging  -  a  roadmap  for  clinical 
translation”,  SPEC2010,  Manchester,  June  2010  (Plenary  opening  lecture) 

14.  R.  Bhargava  “Non-perturbing  cancer  diagnostics  using  infrared  spectroscopic  imaging”,  Pittcon  2010, 
Orlando,  March  2010 

15.  R.  Bhargava  “Progress  towards  cancer  pathology  using  infrared  spectroscopic  imaging”  ,  Pittcon  2010, 
Orlando,  March  2010 

16.  R.  Bhargava  “Enabling  systems  pathology  by  infrared  spectroscopic  imaging”,  Pittcon  2010,  Orlando, 
March  2010 

17.  R.  Bhargava  “Pathology  without  pathologists?”  Pathological  Society  of  Great  Britain  and  Ireland, 
London,  January  2010 

18.  Bhargava,  R.K.  Reddy,  M.  Schulmerich,  A.K.  Kodali,  F.N.  Pounder  B.J.  Davis  “Next-generation 
infrared  imaging  for  biomedical  spectroscopy”,  FACSS  09,  Louisville,  October  2009 

19.  R.  Bhargava,  J.  Ip,  A.K.  Kodali,  F.N.  Pounder  B.J.  Davis  “Ultrafast  IR  imaging  for  Biomedical 
applications”,  ICAVS-5,  Melbourne,  July  2009  (Plenary  Lecture) 

20.  R.  Bhargava  “Imaging:  Does  it  really  offer  more  than  'just'  pretty  pictures”,  SAS  50  years  symposium, 
Pittcon  09,  Chicago,  March  2009 

21.  R.  Bhargava,  R.K.  Reddy  “The  critical  role  of  controlled  quality  of  spectral  information  and  sampling 
on  automated  histologic  recognition”,  Pittcon  09,  Chicago,  March  2009 

22.  R.  Bhargava,  F.N.  Pounder,  X.  Llora  and  R.K.  Reddy  “Enhancing  the  tissue  segmentation  capability  of 
fast  infrared  spectroscopic  imaging  via  chemometric  methods",  FACSS08,  Reno,  September  2008 

23.  R.  Bhargava,  F.N.  Keith,  R.K.  Reddy  and  A.K.  Kodali  “Practical  infrared  spectroscopic  imaging 
instrumentation  for  translating  laboratory  results  to  clinical  settings”,  FACSS08,  Reno,  September  2008 


24.  R.  Bhargava  “Spectroscopic  Imaging  for  an  Automated  Approach  to  Histopathologic  Recognition  in 
Prostate  Tissue”  82nd  Annual  North  Central  Section  American  Urological  Association  Meeting,  Chicago, 
September  2008 

25.  R.  Bhargava,  R.K.  Reddy,  A.K.  Kodali  “Ultrafast  mid- infrared  spectroscopic  imaging  by  combined 
computational  and  experimental  optimizations”  ISSSR  2008,  Hoboken,  June  2008 

26.  R.  Bhargava,  R.K.  Reddy,  R.  Kong,  G.  Srinivasan  “Engineering  practical  protocols  for  histopathology 
of  human  tissues  and  models  using  infrared  spectroscopic  imaging”,  Pittcon08,  New  Orleans,  March 
2008 

27.  R.  Bhargava,  R.K.  Reddy,  R.  Kong,  F.  N.  Keith,  G.  Srinivasan  “Automated  Cancer  Histopathology  by 
Practical  Infrared  Spectroscopic  Imaging:  Progress  and  Potential”  The  International  Conference  on 
Perspectives  in  Vibrational  Spectroscopy  (ICOPVS),  Thiruvananthapuram,  Kerala  ,  India,  February 
2008  (Plenary  Lecture) 


Other  invited  presentations 

28.  National  Institute  for  Standards  and  Technology  (NIST),  Gaithersburg,  201 1 

29.  Center  for  nanoscale  science  and  technology  annual  symposium,  University  of  Illinois,  Urbana,  2011 

30.  Young  Breast  Cancer  Survivors  coalition  symposium,  Urbana,  2010 

3 1 .  Synchrotron  Research  Center,  Madison,  WI,  2010 

32.  iOptics  Seminar,  University  of  Illinois  at  Urbana-Champaign,  Urbana,  2010 

33.  Bruker  Optics  users  meeting,  Boston,  2010 

34.  University  of  Illinois  Cancer  Center,  UIC,  Chicago,  2010 

35.  Department  of  Bioengineering,  Ohio  State  University,  2009 

36.  Beckman  Institute  Director’s  Seminar  Series,  UIUC  2009 

37.  Department  of  Chemistry,  University  of  Tennessee,  Knoxville,  2009 

38.  Biointerest  Group  Seminar,  Mechanical  Science  and  Engineering,  UIUC,  2008 

39.  Fester  Wolfe  Workshop,  MIT,  2008 

40.  Translational  Biomedical  Research  Seminar,  Veterinary  Medicine,  UIUC,  2008 

41.  Vistakon,  A  Division  of  Johnson  and  Johnson,  Jacksonville,  2008 

42.  Faser  Science  Center,  Indian  Institute  of  Technology,  Kanpur,  2008 

Contributed  presentations 

First  author  is  the  presenting  author,  unless  indicated  by  * 

1.  M.J.  Walsh,  D.  Mayerich,  E.F.  Wiley,  R.  Emmadi,  A.  Kajdacsy-Balla,  R.  Bhargava  R.  “Mid-Infrared 
Spectroscopic  Imaging  for  Breast  Tissue  Histopathology:  Towards  'Stainless  Staining”  1st  Congress 
of  the  International  Academy  of  Digital  Pathology,  Quebec,  Canada,  August  2011. 

2.  R.K.  Reddy,  B.J.  Davis,  P.S.  Carney,  R.  Bhargava  “Modeling  Fourier  transform  infrared 
spectroscopic  imaging  of  Prostate  and  breast  cancer  tissue  specimens”  IEEE  International  Symposium 
on  Biomedical  Imaging  (ISBI),  Chicago,  March  2011 

3.  J.T.  Kwak,  S.  Sinha,  R.  Bhargava  “Histological  segmentation  for  infrared  spectroscopic  imaging 
using  frequent  pattern  mining”  IEEE  International  Symposium  on  Biomedical  Imaging  (ISBI), 
Chicago,  March  2011 

4.  M.J.  Walsh,  R.  Bhargava  “Towards  Comprehensive  Histopathological  Analyses  in  Breast  and 
Prostate  Tissue  Using  Mid-IR  Spectroscopic  Imaging”  FACSS  2010,  Raleigh,  October  2010 

5.  R.K.  Reddy,  B.J.  Davis,  R.  Bhargava  “Enhanced  Models  for  Fourier  Transform  Infrared  (FT-IR) 
Spectroscopic  Imaging  of  Human  Tissue  Specimens”  FACSS  2010,  Raleigh,  October  2010 

6.  M.J.  Walsh,  J.  Ip,  C.  Cvetkovic,  R.  Bhargava  “Mid-IR  Imaging  for  Identification  of  Cells  and  Mucin 
Subtype  in  the  Gastrointestinal  Tract”  FACSS  2010,  Raleigh,  October  2010 


7.  B.  Kwon,  M.V.  Schulmerich,  L.  Elgass,  R.  Kong,  S.  Holton,  R.  Bhargava,  W.P.  King  “Infrared 
Imaging  Spectrometry  using  an  Atomic  Force  Microscope”  MRS  Fall  Meeting,  Boston,  November 
2010 

8.  M.J.  Walsh,  R.  Bhargava  “Histopathological  Analyses  in  Breast  and  Prostate  Tissue  Using  Mid-IR 
Spectroscopic  Imaging”  SPEC  2010,  Manchester,  June  2010 

9.  R.K.  Reddy,  R.  Bhargava  “Modeling,  Data  Visualization  and  Histopathology  using  Fourier 
Transform  Infrared  (FT-IR)  Spectroscopic  Imaging  of  Human  Tissue  Specimens”,  BMES  2009 
Pittsburgh,  PA,  October  2009 

10.  M.  J.  Walsh,  M.  J.  Nasse,  F.  N.  Pounder,  V.  Macias,  A.  Kajdacsy-Balla,  C.  Hirschmugl,  R.  Bhargava 
“Mid-infrared  spectroscopic  imaging  of  prostate  tissue  towards  cancer  diagnosis  and  prognosis”, 
BMES  2009,  Pittsburgh,  October  2009 

11.  M.J.  Walsh,  M.  J.  Nasse,  F.  N.  Pounder,  V.  Macias,  A.  Kajdacsy-Balla,  C.  Hirschmugl,  R.  Bhargava 
“Synchrotron  FT-IR  imaging  for  identification  of  cell  types  within  human  tissues”,  WIRMS  2009, 
Banff,  Canada,  September  2009 

12.  F.N.  Pounder,  R.  Bhargava  “Human-Competitive  Histologic  Follow-up  to  Breast  Cancer  Screening  with 
Mid-IR  Spectroscopic  Imaging,”  Pittcon,  Chicago,  March  2009 

13.  R.K.  Reddy,  R  Bhargava  ‘  ‘Automated  and  fast  histologic  characterization  in  urology:  progress  towards 
an  unmet  clinical  need”,  Urology:  Diagnostics,  Therapeutics,  Robotics,  Minimally  Invasive,  and 
Photodynamic  Therapy,  BiOS  2009,  San  Jose,  CA 

14.  R.K.  Reddy,  F.N.  Pounder,  R.  Bhargava  “Validating  the  cancer  diagnosis  potential  of  mid-infrared 
spectroscopic  imaging”,  SPIE  Photonics  West  -  BiOS  2009,  San  Jose,  CA 

15.  J.  Ip,  R.  Bhargava  “Integrating  instrumentation,  computation  and  sampling  for  a  high  throughput 
approach  to  automated  histology  by  mid-infrared  microscopy”,  Advanced  Biomedical  and  Clinical 
Diagnostic  Systems  VII,  SPIE  Photonics  West  -  BiOS  2009,  San  Jose,  CA 

16.  M.J.  Walsh,  F.N.  Pounder,  R.  Bhargava  “Spectral  pathology  in  breast  cancer  using  mid-infrared 
spectroscopic  imaging”,  Imaging,  Manipulation,  and  Analysis  of  Biomolecules,  Cells,  and  Tissues  VII, 
SPIE  Photonics  West  -  BiOS  2009,  San  Jose,  CA 

17.  R.  Bhargava,  A.K.  Kodali,  F.N.  Pounder,  R.K.  Reddy  “High-speed  Infrared  Spectroscopic  Imaging  for 
Tissue  Histopathology”,  EAS  2008,  Somerset,  November  2008 

Funding  received  for  based  on  work  supported  by  this  award 


Project/Proposal  Title: 

Infrared  microscopy  for  prostate  pathology  (Role:  PI) 

Source  of  Support:  National  Institutes  of  Health 

Total  Award  Amount:  $  1  832,819  Total  Award  Period  Covered 

Location  of  Project:  Urbana,  IL 

Person-Months  Per  Year  Committed  to  the  Cal: 


:  02/01/2010-12/31/2014 
Acad:  Sumr:  1.0 


Funding  applied  for  based  on  work  supported  by  this  award 


None  at  present 


Employment  or  research  opportunities  applied  for  and/or  received  based  on 
experience/training  supported  by  this  award. 

Dr.  Brynmor  Davis,  a  post-doctoral  fellow  working  on  this  project  obtained  employment  with 
Creare  Inc.,  NH. 


Dr.  Gokulakrishnan  Srinivasan,  a  post-doctoral  fellow  working  on  this  project  obtained  employment  with 
Bruker  Optics. 


Conclusion . 

The  work  accomplished  demonstrates  clear  potential  and  protocols  for  classifying  prostate  tissue. 
If  the  protocols  are  validated  in  on-going  larger  studies  and  translated  to  the  clinic,  a  new  tool  for 
prostate  histopathology  will  be  available  for  pathologists  and  benefits  will  be  realized  by 
patients. 


So  What  Section 

An  automated  method  to  assist  prostate  pathologists  is  available  and  can  rapidly  determine  the 
presence  of  cancer  in  biopsies.  An  automated  aid  to  grading  is  available  to  aid  pathologists  in 
making  accurate  decisions.  Clinical  translation  of  these  discoveries  can  directly  improve  prostate 
healthcare,  resulting  in  better  treatment  of  individuals. 
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Abstract  Fourier  transform  infrared  (FTIR)  chemical  im¬ 
aging  is  a  strongly  emerging  technology  that  is  being 
increasingly  applied  to  examine  tissues  in  a  high-throughput 
manner.  The  resulting  data  quality  and  quantity  have 
permitted  several  groups  to  provide  evidence  for  applica¬ 
bility  to  cancer  pathology.  It  is  critical  to  understand, 
however,  that  an  integrated  approach  with  optimal  data 
acquisition,  classification,  and  validation  is  necessary  to 
realize  practical  protocols  that  can  be  translated  to  the  clinic. 
Here,  we  first  review  the  development  of  technology 
relevant  to  clinical  translation  of  FTIR  imaging  for  cancer 
pathology.  The  role  of  each  component  in  this  approach  is 
discussed  separately  by  quantitative  analysis  of  the  effects 
of  changing  parameters  on  the  classification  results.  We 
focus  on  the  histology  of  prostate  tissue  to  illustrate  factors 
in  developing  a  practical  protocol  for  automated  histopa¬ 
thology.  Next,  we  demonstrate  how  these  protocols  can  be 
used  to  analyze  the  effect  of  experimental  parameters  on 
prediction  accuracy  by  analyzing  the  effects  of  varying 
spatial  resolution,  spectral  resolution,  and  signal  to  noise 
ratio.  Classification  accuracy  is  shown  to  depend  on  the 
signal  to  noise  ratio  of  recorded  data,  while  depending  only 
weakly  on  spectral  resolution. 
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Introduction 

Cancer  is  one  of  the  leading  causes  of  death  in  the  western 
world  and  is  becoming  increasingly  prevalent  worldwide.  It 
is  well  established  that  appropriate  therapy  for  cancers 
diagnosed  early  generally  leads  to  improved  prognosis  and 
longer  survival.  Consequently,  population  screening  tests  to 
detect  disease  are  increasingly  being  deployed.  The 
emphasis  in  screening  populations  is  on  obtaining  a  high 
sensitivity  through  simple  diagnostic  tests.  For  example,  the 
prostate-specific  antigen  (PSA)  assay  [1]  helps  triage 
persons  at  risk  for  prostate  cancer.  A  cutoff  level  (typically 
4  ng  m  L  1 )  or  increase  in  PSA  velocity  implies  that  the 
screened  person  should  be  at  heightened  surveillance  and 
typically  undergoes  a  biopsy  to  confirm  disease.  Morpho¬ 
logic  structures  in  biopsied  tissue,  as  diagnosed  by  a 
pathologist,  are  the  only  definitive  indicator  of  disease 
and  form  the  gold  standard  of  diagnosis  [2],  Along  with 
clinical  history,  stage,  and  PSA  values,  pathologic  diagno¬ 
ses  form  a  cornerstone  of  clinical  therapy  and  serve  as  a 
basis  for  a  vast  majority  of  research  activity  [3]. 

Typically,  multiple  samples  are  withdrawn  from  the 
organ  during  biopsy.  Extracted  tissue  samples  are  fixed, 
embedded,  and  sectioned  (typically  to  1-  to  5-um  thickness) 
onto  a  glass  slide  for  review.  By  itself,  tissue  does  not  have 
much  useful  contrast  in  optical  brightfield  microscopy. 
Hence,  the  prepared  slide  is  stained  with  dyes.  A  mixture  of 
hematoxylin  and  eosin  (H&E)  is  commonly  employed, 
staining  protein-rich  regions  pink  and  nucleic  acid-rich 
regions  of  the  tissue  blue,  for  example,  as  shown  in  Fig.  1. 
Using  the  contrast,  a  trained  person  can  recognize  specific 
cell  types  and  alterations  in  local  tissue  morphology  that  are 
indicative  of  disease.  In  prostate  tissue,  epithelial  cells  line 
three-dimensional  ducts.  In  two-dimensional  thin  sections, 
thus,  the  cells  appear  to  line  empty  circular  regions  (lumen). 
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Fig.  1  Brightfield  microscopy 
images  of  unstained  {left)  and 
stained  (right)  prostate  tissue 
sections.  Hematoxylin  and  eosin 
(H&E)  stains  provides  contrast, 
allowing  a  trained  person  to 
recognize  epithelial  cells  and 
ductal  structure  (lumen),  while 
ignoring  artifacts  and  confound¬ 
ing  morphologies.  A  trained 
human  can  also  leam  to  robustly 
recognize  patterns  within  lumen 
that  indicate  cancer.  The  scale 
bar  corresponds  to  100  pm 


Tear  Artifact 


Distortions  in  normal  lumen  appearance  provide  evidence 
of  cancer  and  characterize  its  severity  (grade).  The  process 
is  fundamentally  a  manual  pattern  recognition  that  seeks 
to  match  observations  to  known  healthy  or  diseased 
morphologies. 

Manual  examination  of  biopsies  is  very  powerful  in  that 
humans  can  not  only  recognize  disease  generally  but  can 
also  overcome  confounding  preparation  artifacts,  detect 
unusual  cases,  and  recognize  deficiencies  in  diagnostic 
quality  This  capability  of  considering  and  neglecting  fea¬ 
tures  based  on  prior  knowledge  is  crucial  for  accurate  and 
robust  diagnoses.  The  process,  however,  is  time  consuming, 
allows  for  limited  throughput  and,  frequently,  leads  to 
variance  in  subjective  judgments  about  the  disease  severity, 
i.e.,  grade  [4].  As  an  alternative,  computer-based  pattern 
recognition  approaches  to  diagnose  disease  may  provide 
more  accurate,  reproducible,  and  automated  approaches  that 
could  reduce  variance  in  diagnosis  while  proving  econom¬ 
ically  favorable.  Hence,  attempts  have  been  made  to 
characterize  morphology  using  H&E  image  analysis  as 
well  as  biomarkers  to  stain  for  specific  molecular  features. 
Automated  approaches  that  can  rival  human  performance  in 
usual  clinical  settings,  however,  are  still  unavailable. 
Specifically,  the  attributes  of  high  accuracy  and  robust 
applicability  are  lacking. 

The  information  content  of  H&E-stained  images  is 
limited  and  attempts  to  automatically  recognize  structural 
patterns  indicative  of  prostate  cancer,  unfortunately,  have 
not  led  to  clinical  protocols.  Similarly,  probe-based  molec¬ 
ular  imaging  can  provide  exquisite  information  regarding 
the  location  and  content  of  specific  epitopes  but  is  limited 
by  complex  diseases  not  expressing  universally  the  same 
epitopes  or  panels  of  markers.  Stains  used  can  generally 
detect  one  feature  that  may  aid  diagnosis  (e.g.,  AMACR) 
but  do  not  provide  entire  diagnostic  information  in 
themselves.  An  exciting  alternative  is  emerging  in  the  form 
of  chemical  imaging  and  microscopy  [5].  As  opposed  to 
conventional  dye-assisted  imaging  or  probe-assisted  molec¬ 
ular  imaging,  chemical  imaging  [6]  seeks  to  directly 
measure  the  identity  and/or  concentration  of  chemical 
species  in  the  sample  using  spectroscopy.  Hence,  no 


molecular  probes  (MPs)  are  needed  to  see  the  presence  of 
specific  epitopes;  instead  computer  algorithms  are  used  to 
extract  information  from  the  data  (instead  of  MP  hybrid¬ 
ization)  and  statistical  methods  are  used  to  provide 
confidence  (as  opposed  to  brown  tints  for  MPs).  The 
approach  is  limited  only  by  the  ability  of  the  technology  to 
sense  specific  types  of  molecules  or  otherwise  resolve 
chemical  species  and  morphologic  structures.  Among  the 
prominent  approaches  are  vibrational  spectroscopic  imag¬ 
ing,  both  Raman  and  infrared  (1R),  as  well  as  mass 
spectroscopic  imaging  (MSI)  [7,  8]  and  magnetic  resonance 
spectroscopic  imaging  (MRS I)  [9].  While  each  technology 
promises  a  specific  measurement  (e.g.,  proteins  or  meta¬ 
bolic  products)  for  specific  situations  (e.g.,  in  vivo  or  ex 
vivo),  1R  spectroscopic  imaging  [10]  is  particularly  attrac¬ 
tive  for  the  analysis  of  tissue  biopsies  in  that  it  permits  a 
rapid  and  simultaneous  fingerprinting  of  inherent  biologic 
content,  extraneous  materials,  and  metabolic  state  [11-14], 
IR  spectroscopic  imaging,  generally  practiced  using 
interferometry  and  termed  Fourier  transform  infrared 
(FT1R)  spectroscopic  imaging  or,  succinctly,  FT1R  imaging, 
offers  a  particular  combination  of  spatial,  spectral,  and 
chemical  detail  [15],  Limitations  of  FTIR  imaging  include 
coarser  spatial  resolution  compared  to  Raman  imaging  or 
high  powered  optical  microscopy  and  lack  of  specific 
molecular  detail  compared  to  MSI.  Tissue  biopsies  are 
examined  as  thin  sections  on  a  solid  substrate.  The  tissue  is 
dehydrated  and  is  stable  due  to  fixation.  Typically,  struc¬ 
tures  of  pathologic  interest  are  several  to  hundreds  of 
micrometers  in  size,  requiring  fairly  moderate  magnifica¬ 
tions  for  decision  making.  These  conditions  imply  that  the 
need  to  image  in  vivo,  at  exceptionally  high  spatial 
resolution,  or  in  aqueous  environments  is  not  critical  and 
that  standard  pathologic  laboratory  processing  can  be 
employed  for  IR  imaging.  Due  to  the  linear  absorption 
process  being  utilized,  the  signal  from  IR  spectroscopy  is 
large  and  readily  obtained,  promising  relatively  simple 
instrumentation.  Hence,  the  technology  provides  a  platform 
that  is  potentially  useful  for  clinical  practice  in  pathology.  It 
must  be  emphasized  that  no  particular  technology  is  ideally 
suited  to  all  applications  but  a  careful  matching  of  the 
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Fig.  2  Potential  application  of  FTIR  imaging  for  pathology.  The 
current  paradigm  of  cancer  diagnosis  and  grading  upon  biopsy 
involves  sample  processing,  staining,  and  pathologist  review  (left, 
shaded  boxes).  To  implement  the  paradigm  of  automated  analysis 
(right,  unshaded  boxes),  IR  chemical  imaging  is  followed  by 
computer  analysis  for  diagnosis.  Since  IR  imaging  is  label-free  and 
non-perturbing,  the  sample  can  be  stained,  providing  the  pathologist 
with  both  IR  chemical  and  conventional  stained  images 


technique  to  the  application  can  lead  to  useful  protocols. 
While  the  potential  advantages  of  FTIR  imaging  for 
examining  tissue  biopsies  is  high,  practical  protocols  for 
clinical  deployment  are  being  developed  by  many  groups. 

Numerous  recent  reviews  are  available  to  address 
biomedical  applications  of  FTIR  spectroscopy  and  imaging 
[16-20],  especially  related  to  diseases  and  cancer.  These 
reviews  address  instrumentation,  the  applicability  to  various 


systems,  spectroscopic  bases  and  classification  algorithms 
for  decision  making,  and  controversial  aspects  in  the 
backdrop  of  the  evolution  of  the  field.  The  commercial 
availability  of  high-fidelity  FTIR  imaging  instruments, 
advances  in  computers  and  data  analysis  algorithms,  and 
increasing  interest  have  combined  to  generate  an  increasing 
volume  of  studies.  At  the  same  time,  there  is  considerable 
debate  emerging  on  various  aspects  of  the  process.  Reports 
study  a  variety  of  organs  that  may  not  correlate  in  behavior, 
utilize  different  sample  acquisition  and  processing  tech¬ 
niques,  employ  different  instrumentation,  data  acquisition, 
or  handling  protocols,  and  apply  a  variety  of  decision¬ 
making  algorithms.  While  this  has  led  to  a  lively  community 
of  practitioners  and  exploration  of  various  facets  such  as 
resolution,  biological  diversity,  and  chemometric  or  statis¬ 
tical  methods,  studies  have  generally  focused  on  one  aspect. 
Many  excellent  studies  have  developed  each  of  these 
aspects  to  the  point  of  routine  use  in  advanced  laboratory. 
The  focus  in  the  field  is  now  on  understanding  biochemical 
signals  and  developing  protocols  from  high  quality  data  that 
can  actually  lead  to  clinical  acceptance.  We  contend  that  the 
development  of  clinical  protocols  is  necessarily  integrative 
and,  in  this  manuscript,  review  first  the  salient  aspects  in 
developing  a  practical,  integrative  approach  to  spectroscopic 
imaging  for  cancer  histopathology.  Second,  we  discuss  the 
issues  of  spatial  selectivity,  sample  size  calculations, 
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Fig.  3  Correspondence  of  conventionally  stained  and  FTIR  chemical 
images  for  pathology  applications,  a  Hematoxylin  and  eosin  (H&E)- 
stained  image  of  prostate  tissue  section.  Hematoxylin  stains  negatively 
charged  nucleic  acids  (nuclei  &  ribosomes)  blue,  while  eosin  stains 
protein-rich  regions  pink.  The  diameter  of  the  sample  is  ca.  500  pm. 
Simple  univariate  plots  of  specific  vibrational  modes  provides  for 
enhancement  or  suppression  of  specific  cell  types,  b  Absorption  at 


rich  epithelial  cells  in  the  manner  of  hematoxylin,  c  Spatial 
distribution  of  a  protein-specific  peak  (ca.  1,245  cnT1  )  highlights 
differences  in  the  manner  of  eosin.  The  entire  spectrum  can  be 
analyzed  for  a  series  of  markers  that  provide  more  information  than 
H&E  or  univariate  images,  as  shown  in  d  where  specific  cells  are 
color  coded  based  on  their  spectral  features  (e) 
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optimization  considerations,  and  potential  improvements  in 
algorithms  that  can  provide  faster  results.  Tests  to  determine 
performance  and  limits  of  accuracy  are  reported  as  a 
function  of  experimental  parameters.  We  focus  here  on 
prostate  histology  as  an  illustrative  test  case,  but  emphasize 
that  the  approach  is  applicable  and  similar  insight  is  gained 
with  other  tissues  [21].  Further,  exciting  results  have 
recently  been  reported  for  diagnosis,  grading,  and  classifi¬ 
cation  of  prostate  cancer  [22-26],  including  the  effects  of 
zonal  anatomy  [27]  and  cytokinetic  activity  on  spectra  [28]. 
An  extension  of  the  methodology  here  to  pathology  will 
help  formulate  better  protocols  and  allow  a  better  under¬ 
standing  of  the  performance  of  classifiers. 

Approach  and  essentials 

The  promise  of  chemical  imaging  for  pathology  is 
illustrated  in  Fig.  2.  Our  approach  has  been  to  attempt 
integration  of  our  developments  with  current  clinical 
practice.  Hence,  we  employ  tissues  that  have  been  biopsied, 
fixed,  embedded,  and  sectioned  as  per  usual  clinical 
protocols.  We  differ  in  the  de-paraffmization  step,  suggest¬ 
ing  a  gentle  wash  with  hexane  and  do  not  stain  the  tissue. 
Additionally,  as  IR  chemical  imaging  only  employs  benign 
light,  it  is  non-perturbing  and  entirely  compatible  with  all 
downstream  pathology  processes.  Hence,  the  sample  may 
be  stained  as  usual  (Fig.  2,  dashed  arrow,  top).  Visual¬ 
izations  similar  to  those  observed  in  conventional  pathol¬ 
ogy  are  possible  without  staining  the  tissue.  For  example, 


Fig.  3  correlates  H&E  and  infrared  spectral  images. 
Visualizations  similar  to  H&E  images  may  be  “dialed-in” 
by  utilizing  specific  spectral  features  indicative  of  tissue 
chemistry.  Although,  the  IR  data  only  demonstrate  univar¬ 
iate  representations  in  the  images,  automated  mathematical 
algorithms  can  determine  the  cell  types  and  their  locations 
within  the  image,  while  providing  quantitative  measures  of 
accuracy  and  statistical  confidence  in  results  [29].  These 
data  may  be  employed  to  directly  provide  diagnoses  or  to 
inform  the  pathologist  (Fig.  2,  dashed  arrow,  bottom), 
helping  them  make  better  decisions.  Since  the  results  are 
images,  information  exchange  between  spectroscopists  and 
clinicians  is  facilitated.  Spectroscopic  analyses  can  poten¬ 
tially  be  fully  automated;  thus,  no  additional  users  need  to 
be  trained  or  knowledge  base  acquired  by  current  clinicians. 

A  major  challenge  in  the  field  is  the  development  of 
robust  algorithms  that  employ  spectral  data  to  provide 
histopathologic  information.  Both  supervised  and  unsuper¬ 
vised  approaches  have  been  employed.  We  believe  that 
unsupervised  methods  are  more  suited  to  research  and 
discovery.  Supervised  methods  are  preferred  when  the  data 
need  to  be  related  to  known  conditions,  e.g.,  clinical 
diagnoses.  The  development  of  supervised  classification 
of  IR  chemical  imaging  data  for  histopathology  is  fairly 
straightforward  [30],  The  process  is  shown  in  Fig.  4.  First,  a 
model  for  classification  is  selected.  The  model  comprises 
all  possible  outcomes  for  any  pixel  in  the  images  and  is, 
hence,  bounded  by  definition.  We  tenn  each  histologic 
constituent  of  the  model  a  class  to  denote  that  it  may  not 
correspond  to  specific  cell  types  or  entities  corresponding 


Fig.  4  Process  for  relating  path¬ 
ologic  or  physiologic  state  to 
FTIR  chemical  imaging  data.  A 
model  is  chosen  for  supervised 
classification  (a),  b-d  Training 
data  is  reduced  in  size  and 
optimized  into  a  prediction 
algorithm  using  gold  standard 
data.  The  developed  algorithm 
is  validated  against  a  second, 
independent  data  set  and  the 
accuracy  is  measured  using 
three  different  methods:  ROC 
curves,  confusion  matrices,  and 
image  comparisons 
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to  morphology-based  pathology.  While  this  allows  for 
simplifications  and  allows  the  user  to  focus  on  specific 
cells  relevant  in  disease,  it  is  also  likely  to  prove  useful  in 
the  discovery  of  different  chemical  entities  that  appear 
morphologically  identical. 

Next,  data  from  a  large  number  of  tissue  samples  is 
recorded.  A  set  of  pixels  are  specifically  marked  (gold 
standard)  by  different  colors  to  correspond  to  known 
regions  of  tissue,  usually  by  comparison  with  an  H&E- 
stained  image  or  with  immunohistochemically  stained 
images  [21].  The  recorded  data  set  is  reduced  to  a  smaller 
set  of  measures  that  capture  the  classification  capability  of 
the  entire  data  set.  We  termed  these  measures  metrics. 
There  are  numerous  means  of  obtaining  the  metric  data 
set:  manual  selection  of  large  spectral  regions,  principal 
components  analysis,  genetic  algorithms,  or  a  sequential 
forward  selection  algorithm.  A  numerical  algorithm  is  then 
chosen,  for  example,  a  linear  discriminant  analysis,  neural 
network,  SIMCA,  or  modified  Bayesian  classifier  [31].  The 
classifier  is  optimized  iteratively,  if  needed,  to  optimally 
predict  the  training  data  set.  Subsequently,  the  algorithm  is 
applied  to  a  second  data  set  (independent  validation)  that 
has  been  independently  marked  for  each  class.  A  compar¬ 
ison  of  the  gold  standard  marking  with  the  computation¬ 
ally  predicted  class  provides  a  measure  of  the  accuracy. 
We  have  employed  three  measures  of  accuracy:  receiver 
operating  characteristic  (ROC)  curves  [32]  that  represent 
the  sensitivity  and  specificity  trade-off  of  the  classifier, 
confusion  matrices  that  provide  the  fraction  of  pixels  of 
each  class  classified  as  pixels  of  all  classes,  and  classified 
images  that  can  be  compared  pixel-for-pixel  to  other 
images.  Additionally,  it  is  often  instructive  to  drill  into  the 
classifier  to  obtain  the  basis  for  classification  or  the 
distribution  of  confidence  intervals  for  various  samples. 
The  last  two  factors  are  generally  not  apparent  in  previous 
studies. 

There  are  three  key  developments  that  are  needed  for 
this  approach  to  be  successful:  (a)  high-fidelity  FTIR 
imaging  instrumentation,  (b)  high-throughput  sampling, 
and  (c)  robust  classification  that  provides  statistically 
significant  results  in  a  manner  that  can  be  appreciated  by 
non-experts  in  spectroscopy.  We  briefly  review  the  three 
developments  next. 


FTIR  imaging 

Need  for  spatially  resolved  data 


pixels.  These  can  be  defined  as  pixels  that  are  assigned  to 
one  class  but  would  likely  yield  more  classes,  to  their 
physical  limits,  were  finer  resolution  available.  As  a 
consequence,  the  spectral  content  of  the  boundary  pixel  is 
likely  to  be  mixed  and  will  likely  lead  to  errors  in 
classification.  For  example,  the  confounding  contribution 
of  stromal  spectra  to  cancerous  epithelial  cells  in  breast 
tissue  has  been  proposed  [34],  As  the  resolution  becomes 
coarser,  the  fraction  of  pixels  in  an  image  that  belong  to 
boundary  pixels  increases.  Inclusion  of  these  pixels  has 
been  shown  to  be  a  primary  contributor  to  error  rates  in  data 
[29],  while  their  exclusion  in  accounting  for  accuracy 
necessarily  implies  that  not  all  pixels  are  included.  We 
sought  to  examine  the  effect  of  spatial  resolution  on  the 
prevalence  of  boundary  pixels. 

We  binned  data  acquired  at  6.25-pm  pixel  size  from  148 
samples  in  a  validation  data  set  (=7000  pixels/sample)  to  10-, 
15-,  20-,  30-,  and  50- pm  pixel  sizes.  There  is  an  important 
distinction  between  pixel  size  and  spatial  resolution.  The 
pixel  size  denotes  the  best  possible  optical  resolution,  which 
may  be  limited  by  longer  wavelengths  in  the  spectrum  and 
optical  effects  to  yield  a  poorer  measured  resolution  [35-38], 
For  each  dataset,  we  classified  the  tissue  and  determined 
neighbors  of  each  pixel  that  did  not  belong  to  the  class  of 
the  pixel.  Some  pixels  that  have  no  neighbors  of  other 
classes  may  still  have  empty  pixels  as  neighbors.  Since 
neighboring  empty  pixels  can  only  provide  optical  distor¬ 
tion  [39]  but  do  not  affect  spectral  content;  we  do  not 
consider  them  further.  The  number  of  neighbors  for 
epithelial  pixels  for  different  spatial  resolutions  may  be 
seen  in  Fig.  5.  The  first  observation  is  that  a  large  majority 
of  pixels  have  the  same  class  pixels  as  all  eight  neighbors. 
The  fraction  of  pixels  with  all  neighbors  of  the  same  class 
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The  need  for  spatially  resolved  data  has  been  recognized 
[33],  but  the  effect  of  limited  resolution  data  on  classifica¬ 
tion  is  not  entirely  clear.  The  primary  complication  of 
coarse  spatial  resolution,  obviously,  arises  from  boundary 


Neighbors  of  Other  Class 

Fig.  5  Neighbors  of  cell  types  other  than  epithelium  or  empty  space 
for  different  spatial  resolutions.  The  inset  shows  the  decrease  in 
percent  epithelial  pixels  that  do  not  have  any  other  cell  types  as 
neighbors 
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decreases  rapidly  with  decreasing  resolution  and  stabilizes 
at  ca.  20  |um.  Hence,  a  spatial  resolution  coarser  than 
20  (im  is  unlikely  to  have  an  effect  on  the  classification  but 
is  expected  to  lead  to  about  25%  more  epithelial  pixels 
being  contaminated  compared  to  6.25 -pm  pixel  sizes.  The 
precise  effect  on  a  specific  sample  is  very  dependent  on  the 
sample  morphology  and  is  generally  associated  weakly 
with  pathologic  state.  While  in  itself,  the  statistic  does  not 
imply  that  results  from  coarser  resolution  studies  will  be 
invalid,  practitioners  must  recognize  that  error  rates  may  be 
higher  and  that  this  contribution  may  be  mitigated  by  using 
commonly  available  imaging  systems. 

One  danger  of  classifying  mixed  composition  pixels  is 
whether  they  may  be  classified  as  an  entirely  different  class 
or  disregarded  from  the  data  set  as  belonging  to  no  class. 
We  simulated  pixels  of  composition  ranging  from  0  to 
100%  for  pairs  of  each  class.  We  also  added  noise  to 
simulate  different  data  acquisition  conditions.  An  example 
of  the  data  can  be  seen  in  Fig.  6.  Average  spectra,  one  each 
from  the  two  classes,  are  baselined  and  added  in  ratios 
varying  linearly  from  0  to  100%.  Figure  6b  demonstrates 
the  classification  of  the  gradient  data  set.  In  general,  the 
classification  works  well,  favoring  the  class  with  higher 
concentration.  The  classifier  is  also  stable  at  the  noise  levels 
examined.  A  surprising  result  is  that  pixels  between 
epithelium  and  fibroblast-rich  stroma  are  classified  as 
mixed  stroma.  This  drawback,  however,  is  the  only 
example  of  two  classes  mixing  to  yield  an  entirely  different 
one.  The  reason  also  stems  from  the  definition  of  the  mixed 
stroma  class.  While  the  class  was  designed  to  handle  those 


stromal  cells  that  were  not  clearly  fibroblasts  or  smooth 
muscle  in  origin  but  appeared  mixed,  a  mix  of  epithelium  and 
fibroblast-type  stroma  also  leads  to  the  classification  as  mixed 
stroma.  Noise  seems  to  have  little  effect  on  this  behavior. 

The  full  simulation  of  all  classes  (not  shown)  reveals  that 
mixed  pixels  generally  can  be  classified  as  the  constituent 
classes  with  the  higher  concentration.  Clearly,  boundary 
pixels  at  epithelial  fibroblast-rich  regions  must  be  handled 
with  care.  The  increase  in  boundary  pixels  at  lower 
resolution  also  implies  that  this  type  of  systematic  mis- 
assignment  may  arise  more  frequently.  The  rate  of 
occurrence  of  boundary  pixels  may  be  even  lower  for 
synchrotron-based  imaging  that  is  conducted  at  higher  pixel 
density  or  in  emerging  approaches  that  utilize  synchrotron- 
based  interferometers  and  array  detectors.  The  simulated 
example  above,  however,  demonstrates  that  simply  over- 
sampling  a  spatial  region  to  increase  pixel  density  may 
allow  for  better  definition  of  the  interface  and  assignment 
of  pixels,  though  it  will  not  address  spectral  purity.  Hence, 
for  analyses  based  on  spectral  discrimination,  mixture 
models  will  have  to  be  developed  based  on  entire  spectra. 
For  example,  multivariate  curve  resolution  techniques  hold 
promise. 

A  further  complication  arises  in  using  data  from  his¬ 
tologic  classification  for  pathologic  diagnoses.  For  exam¬ 
ple,  the  boundary  epithelial  pixels  classified  above  may 
disproportionately  contribute  to  classification  errors.  We 
have  found  evidence  for  the  same  in  studies  for  both  cancer 
pathology  and  for  histology  in  tissue  from  different  organs. 
For  example,  the  boundary  pixels  in  benign  tissue  get 


Fig.  6  Mixture  models  and 
classification  for  prostate  histol¬ 
ogy.  a  Absorbance  at 
1,080  cm-1  for  three  classes  and 
their  mixtures.  The  first  column 
contains  mixtures  of  epithelial 
cell  spectra  with  the  average 
spectrum  from  fibroblast-rich 
stroma  and  mixed  stroma. 

The  second  and  third  columns 
contain  mixtures  with  fibroblast- 
rich  and  mixed  stroma,  respec¬ 
tively.  The  concentration 
changes  from  0  to  1 00%  linearly 
along  the  y-direction  as  indicat¬ 
ed  by  the  color  bar  in  c.  b 
Along  the  x-axis  of  the  com¬ 
posite  image,  the  noise  in  each 
cell  increases  linearly.  Error 
bars  are  standard  deviations  of 
noise  in  the  spectra,  c  Classified 
image  for  the  data,  demonstrat¬ 
ing  the  effect  of  composition 
and  noise  on  classification,  d 
Probability  profiles  of  the  three 
cell  types  at  columns  1  and  25, 
demonstrating  the  effect  of  noise 
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Table  1  Correlation  of  composition  for  samples  between  6.25-mm  pixel  sizes  and  other  pixel  sizes 


Pixel  size  (micron) 

Epithelium 

Fibroblast-rich  stroma 

Mixed  stroma 

10 

0.9913x(0.9976) 

0.9847x(0.9923) 

1.0300x(0.9957) 

20 

1.0156x(0.9906) 

0.9671x(0.9775) 

1.0473x(0.9787) 

25 

1.0404x(0.9896) 

0.9768x(0.9624) 

1.0262x(0.9617) 

30 

1.0720x(0.9773) 

0.9683x(0.9507) 

1.0175x(0.9363) 

50 

1.1180x(0.9459) 

0.9410x(0.8947) 

1.0390x(0.8723) 

The  first  row  in  each  cell  denotes  the  composition  factor  for  that  pixel  size  and  class.  For  example,  for  every  100  pm' ,  the  area  of  epithelial  pixels 
at  10-pm  pixel  size  is  99.13%  of  that  at  6.25-pm  pixel  size.  Increasing/decreasing  numbers  represent  pixels  being  increasingly/decreasingly 
classified  as  that  class.  The  ratios  are  not  unifonn  for  every  sample  and  the  regression  coefficient  of  the  best  fit  line  passing  through  the  origin  is 
provided  in  the  second  row  of  the  each  table  cell.  Increasing  pixel  sizes  reflect  greater  variance  from  the  fit  line 


misclassified  as  cancerous,  leading  to  the  major  source  of 
error  in  applying  this  approach  to  pathology.  At  this  time, 
the  evidence  is  anecdotal  and  needs  further  investigation  to 
quantify  the  extent  of  the  error  and  its  mitigation  by 
advanced  numerical  processing.  The  last  interesting  aspect 
of  lower  spatial  resolution  is  that  it  tends  to  over-predict 
certain  classes.  For  example,  Table  1  demonstrates  the 
regression  results  of  each  samples  composition  against  that 
obtained  at  6.25  pm  for  three  classes.  While  the  regression 
coefficient  is  high,  it  is  clear  that  epithelial  and  mixed 
stroma  fractions  are  overestimated  and  fibroblast-rich 
stroma  is  underestimated  with  decreasing  pixel  size.  There 
are  differences  based  on  underlying  pathology.  For  exam¬ 
ple,  normal  epithelium  is  generally  encountered  in  10-  to 
40-pm-wide  strips,  while  high  grade  tumor  may  be 
hundreds  of  micrometers  to  millimeters  in  size.  Individual 
sample  variability  reflected  in  the  regression  coefficient 
decreases  with  increasing  pixel  size.  In  spectroscopic 
models  to  predict  diseases  that  include  morphological  units 
but  are  based  on  average  spectra,  mixed  pixels  may  lead  to 
estimates  with  large  errors.  For  example,  a  1 : 1  mixed  region 
of  epithelial  and  fibroblast  pixels  at  6.25 -pm  pixel  size 
increases  to  ca.  1.19:1  for  50-pm  pixel  size.  Hence,  the  use  of 
histologic  mixture  models  at  limited  spatial  resolution  may 
not  be  estimated  correctly,  providing  evidence  that  the 
percentage  content  of  cell  types  in  a  limited  field  of  view  is 
likely  to  be  a  less  robust  measure  of  tissue  histopathology. 

Evolution  and  capabilities  of  current  instrumentation 

To  overcome  confounding  by  mixing,  as  discussed  above, 
microscopectroscopy  was  proposed  as  an  alternative  [40], 
Single  spectra  (non-FTIR)  have  been  recorded  from 
microscopic  samples  for  over  50  years  [41]  by  restricting 
light  incident  on  the  sample  through  an  aperture.  More  than 
one  point,  however,  is  required  for  tissue  analysis.  Hence, 
sequentially  rastering  the  point  at  which  spectra  are 
recorded,  termed  mapping  or  point  microscopy,  was 
proposed  [42].  A  practical  instrument  obtained  by  coupling 
an  interferometer,  a  microscope,  and  automated  stage  in  the 


late  1980s  [43]  helped  in  numerous  materials  science  [44], 
forensic  [45],  and  biomedical  [46,  47]  studies.  Unfortu¬ 
nately,  the  mapping  approach  has  a  number  of  drawbacks  in 
realizing  the  goal  of  an  FT1R  microscopy  analog  to  optical 
microscopy  [48], 

More  than  85%  of  cancer  arises  in  epithelial  cells,  which 
often  fonn  surface  layers  that  are  10-  to  100- pm  wide.  As 
we  demonstrated  in  the  previous  section,  however,  a 
resolution  higher  than  ca.  10x10  pm  is  preferable. 
Consequently,  the  illuminated  spot  at  the  sample  has  to  be 
made  smaller,  throughput  decreases  proportionally,  which 
in  turn  decreases  the  signal  to  noise  ratio  (SNR)  of  acquired 
spectra.  Orders  of  magnitude  brighter  sources,  e.g.,  syn¬ 
chrotrons,  may  be  employed  to  recover  the  lost  SNR. 
Unfortunately,  synchrotron  or  free  electron  lasers  [49]  are 
prohibitively  expensive  and  no  laboratory  lasers  exist  for 
the  wide  spectral  region.  An  alternative  is  to  average 
successive  measurements  (co-adding)  to  increase  statisti¬ 
cally  the  SNR.  Since  the  SNR  increases  only  as  the  square 
root  of  the  number  of  averaged  spectra,  long  averaging 
periods  are  required.  The  situation  may  be  mitigated  by 
using  higher  condensing  optics,  sources  at  higher  temper¬ 
atures,  slightly  faster  scanning  than  used  here,1  gain 
ranging  [50],  or  ultra-sensitive  detectors  [51].  Even  if  a 
hypothetical  instrument  with  all  these  advances  were 
constructed,  ca.  10-  to  20-fold  reduction  in  time  would  be 
obtained.  Furthermore,  this  calculation  underestimates  the 
time  required  by  not  considering  losses  due  to  diffraction  or 
stage  movement. 

In  prostate  tissue,  for  example,  the  situation  is  similar  to 
Fig.  1.  Epithelial  cells  fonn  10-  to  35-pm-wide  foci  around 
the  cross-sections  of  ducts.  Ducts  appear  as  white  circles  in 
Fig.  lb,  surrounded  by  epithelial  cells  that  are  depicted  in 
blue.  To  analyze  this  morphology,  aperture  dimensions  of 
ca.  6  pmx6  pm  (=  cell  size)  are  proposed  [31];  for  this 
case,  the  mapping  approach  would  require  ca.  1,028  h  for  a 


1  There  is  no  advantage  to  faster  scanning  once  the  modulation 
frequency  has  reached  optimum  level  for  MCT  detectors  (1  MHz). 
The  reduced  time  to  observe  signal  then  decreases  the  SNR. 
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500  (mix 500  p.m  sample  [31].  Hence,  mapping  is  not  a 
viable  option.  In  contrast  to  point  mapping  using  apertures, 
large  fields  of  view  are  measured  in  FTIR  imaging. 
Contributions  from  different  sample  areas  in  imaging  are 
separated  by  an  array  of  mid-IR-sensitive  detection  ele¬ 
ments  in  the  manner  of  imaging  with  CCD  devices  for 
optical  microscopy.  By  coupling  the  multichannel  detection 
of  focal  plane  array  (FPA)  detectors  with  the  spectral 
multiplexing  advantage  of  interferometry,  an  entire  sample 
field  of  view  is  spectroscopically  imaged  in  a  single 
interferometer  scan  [52],  Depending  on  the  microscopy 
configuration,  thousands  of  moderate  resolution  spectra  can 
be  acquired  at  near-diffraction-limited  spatial  resolution  in 
minutes  [53,  54],  The  time  advantage  over  mapping  is 
nominally  the  number  of  pixels  in  the  FPA  (16-  to  65,000- 
fold)  but  the  noise  characteristics  of  FPAs  are  poorer  than 
sensitive  single  point  detectors  [55].  Hence,  the  SNR- 
normalized  advantage  is  lower  [56],  Faster  detectors  are 
being  used  for  imaging  and  promise  significantly  higher 
SNR  in  the  same  time.  For  example,  we  have  employed  a 
128x128  element  MCT  array  operating  at  ca.  16  kHz  to 
acquire  a  full  data  set  in  ca.  0.07  s  [unpublished].  These 
rates  of  data  acquisition  are  approximately  a  factor  of  10 
higher  than  commercially  available,  but  are  required  for 
practical  data  acquisition  times.  Increase  in  data  acquisition 
speed  remains  a  bottleneck  for  applications  of  1R  imaging 
to  routine  clinical  studies.  Coupled  with  the  complexity  and 
cost  of  instrumentation,  present  technology  provides  pre¬ 
liminary  capability  but  is  likely  to  prove  a  barrier  to 
practical  clinical  translation. 

High-throughput  sampling  and  statistical  pitfalls 

Quantitative  analyses  of  results 

The  best  imaging  instruments  (which  employ  sensitive 
detectors  and  a  small  multichannel  advantage)  can  acquire 
data  in  about  0.1%  of  the  time  required  for  mapping  for 
equivalent  parameters.  Hence,  point  mapping  studies  in 
pathology  typically  exceed  numbers  in  only  one  of  these 
categories:  spatial  resolution  (ca.  15-20  pm),  numbers  of 
patients  (ca.  50)  or  recorded  small  numbers  of  spectra  per 
patient  (ca.  100).  These  numbers  may  typically  be  improved 
an  order  of  magnitude  with  imaging.  For  example,  a  recent 
report  analyzed  ca.  ten  million  spectra  from  ca.  1,000 
samples  at  a  spatial  resolution  of  6.25  pm  [26],  This 
quantitative  validation  is  necessary  for  any  automated 
biomarker  approach  (vide  infra)  [57].  Studies  are  underway 
in  our  and  other  laboratories  to  correlate  spectral  patterns 
with  other  physiologic  and  pathologic  conditions;  recent 
published  studies  verify  the  robustness  and  potentially  wide 
applicability  of  FTIR  microscopy  [58,  59]. 


Sample  size 

Though  these  studies  demonstrate  potential,  [60,  61] 
considerable  debate  exists  on  reproducibility  and  accuracy 
measures  for  larger  studies  [29],  The  first  response  of  many 
practitioners  to  new  data  is  a  question  of  validity  based  in 
limited  statistical  confidence.  A  detailed  understanding  is 
emerging  from  the  work  of  several  groups  regarding 
appropriate  sample  control  [62]  and  confounding  factors 
due  to  biology  [63],  Inherent  differences  between  patient 
cohorts,  effects  of  sample  preparations  and  measurement 
noise  are  topics  that  can  be  addressed  with  the  available 
imaging  technology  but  are  yet  to  be  fully  explored.  Hence, 
validating  robust  spectral  markers  for  large  sample  pop¬ 
ulations  [64,  65]  is  exceptionally  challenging  and  the 
chance  for  chance  and  bias  influencing  results  exists. 

Most  importantly,  the  fundamental  question  of  sample 
size  required  has  remained  open.  There  are  two  major 
concerns:  first,  the  optimal  sample  size  in  forming  calibra¬ 
tion  sets  and  a  prediction  algorithm.  Second,  investigators 
must  determine  whether  the  results  shown  can  be  supported 
by  statistical  considerations.  While  the  first  problem  is 
essentially  one  of  optimizing  a  model  and  prediction 
algorithm,  the  second  impacts  the  quality  of  results  and 
claims  of  applicability  directly.  In  this  manuscript,  we 
examine  only  the  second  aspect.  Detennining  the  optimal 
sample  size  to  form  robust  models  is  a  more  involved 
problem  and  is  discussed  elsewhere.  The  statistical  validity 
of  obtained  results  and  dependence  on  data  acquisition 
parameters  are  discussed  later  in  this  manuscript.  Specifi¬ 
cally,  we  estimate  sample  size  based  on  the  standard  error 
for  the  area  under  the  curve  for  an  ROC  curve. 

Gold  standard 

The  selection  of  pixels  as  gold  standards  needs  great  care.  It 
must  be  done  independently  of  any  classifier  training  or 
validation,  thus  ensuring  a  blinded  study  design.  Once  the 
gold  standard  set  is  determined,  it  must  not  be  changed. 
This  will  ensure  that  there  is  no  bias  in  the  process.  Care 
must  be  taken  to  avoid  pixels  that  do  not  lie  on  the  tissue  or 
those  that  are  at  the  boundary  as  these  may  artificially 
inflate  the  error.  The  use  of  all  pixels  in  an  image  has  been 
suggested  and  their  exclusion  has  been  proposed  to 
contribute  selection  bias.  Selection  bias,  however,  does 
not  arise  in  pixels  that  are  chosen  independent  of  validation 
algorithms.  The  exclusion  of  boundary  pixels  is  necessary 
in  both  training  (to  avoid  spurious  probability  distribution 
functions)  and  validation  (to  prevent  introduction  of  errors). 
There  are  major  technological  difficulties  in  relating  stained 
visible  to  IR  images  from  unstained  tissue  due  to  changes 
during  staining,  leading  to  errors.  Hence,  it  has  been 
proposed  that  the  exclusion  of  boundary  pixels  in  akin  to 
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the  performance  of  a  classifier  with  a  reject  option  for  the 
boundary. 

Sampling,  archiving,  and  consistency > 

While  it  is  unclear  what  an  optimal  sample  size  would  be,  it 
is  clear  that  a  large  number  of  tissue  samples  are  needed  for 
effective  validation.  While  it  may  theoretically  be  possible 
to  train  on  a  single  sample,  validation  of  a  protocol  is 
required  on  more  samples.  We  recognized  that  one  does  not 
need  to  observe  the  full  surgically  resected  tumor  for 
validating  IR  protocols,  but  would  need  a  representative 
small  section.  Hence,  we  employed  tissue  microarrays 
(TMAs)  [66]  as  a  platform  for  high-throughput  sampling. 
TMAs  consist  of  a  large  number  of  small  tissue  samples 
arranged  in  a  grid  and  deposited  on  the  same  substrate. 
They  are  typically  manufactured  by  embedding  cylindrical 
cores  in  a  receiving  block  and  sectioning  the  block 
perpendicular  to  the  long  axis  of  the  core.  Thin  sections 
are  then  floated  on  to  a  rigid  substrate  for  analysis.  The 
technique  facilitates  rapid  visualization  of  results  of  any 
classification  protocol,  while  revealing  localization  and 
prevalence  of  any  errors.  Sample  processing  times  may 
easily  be  increased  100-fold,  valuable  tissues  are  optimally 
utilized,  and  consecutive  TMA  sections  can  be  used  to 
correlate  with  staining  results.  Construction  and  analysis  of 
TMAs  has  been  automated,  further  increasing  the  through¬ 
put.  For  spectroscopists,  TMAs  provide  a  ready  source  of 
tissue  to  test  hypothesis  and  develop  prediction  models. 

The  validity  of  employing  TMAs  for  prostate  cancer 
research  and,  especially,  for  cancer  grading  has  been 
addressed  by  a  number  of  authors  [67].  For  example,  a 
study  of  genitourinary  pathologists  [4]  with  images  from 
TMA  cores  assesses  that  ca.  90%  considered  this  approach 
useful  for  resident  training  and  for  pathology  teaching. 
Further,  Gleason  score  was  easily  assigned  to  each  TMA 
spot  of  a  0.6-mm-diameter  prostate  cancer  sample.  Hence, 
the  utility  of  TMAs  is  not  only  in  providing  numerous 
samples  in  a  compact  manner  for  the  advantages  above, 
but  also  in  consistency  of  the  diagnoses  and  precision  in 
analyzing  similar  areas.  Virtual  tissue  microarrays  could 
be  constructed  from  different  areas  of  large  samples,  thus 
providing  many  sub-samples  for  within-patient  and  among- 
patient  comparisons.  This  approach  has  not  yet  been  re¬ 
ported  but  is  likely  a  useful  extension  of  the  TMA  concept. 

Prediction  algorithms  and  high-throughput  data 
analysis 

Univariate  algorithms 

The  major  technological  advances  of  fast  FTIR  microscopy 
and  high-throughput  tissue  sampling  have  been  addressed 


by  imaging  and  TMAs  respectively.  There  is  still  some 
confusion  and  widespread  disagreement,  however,  about 
the  “best”  approach  to  extract  histopathologic  information 
from  FTIR  imaging  data.  Several  early  manuscripts  employ 
univariate  correlations  to  disease  states  [68],  While  the 
results  were  exciting,  it  is  now  realized  that  they  were 
statistically  flawed  and  did  not  necessarily  contain  a 
fundamental  basis  in  cancer  biology.  To  our  knowledge, 
there  is  no  manuscript  that  has  expressly  demonstrated, 
using  statistics  arguments,  why  univariate  analyses  are 
likely  to  fail.  There  is  widespread  consensus  and  anecdotal 
evidence,  however,  among  practitioners  that  argues  against 
the  approach.  Consider  the  distributions  for  a  univariate 
measure  (absorbance  at  1,080  cm  1  that  is  normalized  to 
the  amide  1  peak  height)  for  benign  and  malignant  cases  as 
shown  in  Fig.  7. 

The  normalized  histograms  reveal  that  for  specific, 
single  samples  the  distribution  of  absorbance  at  pixels  is 
such  that  it  clearly  indicates  the  metric  to  be  a  good  one  for 
cancer  discrimination.  When  the  distribution  from  all 
samples  is  considered,  however,  there  is  little  difference  in 
the  distributions.  Hence,  many  univariate  measures  de¬ 
scribed  in  the  literature  do  not  hold  up  in  wide  population 
testing.  A  TMA-based,  high-throughput  validation  can 
easily  prove  that  the  measure  is  not  a  good  one  but  does 
discriminate  some  samples.  In  Fig.  7,  a  cutoff  value  can 
generally  be  found  that  distinguishes  disease,  leading  to  the 
erroneous  conclusion  that  the  feature  is  universally  indic¬ 
ative  of  disease  state.  Since  a  typical  infrared  spectrum  has 
numerous  frequencies  and  even  non-chemically  specific 
features  that  can  provide  discrimination,  a  small  number  of 
samples  increases  the  probability  of  finding  such  discrim¬ 
ination  by  chance  alone.  Univariate  measures  that  appar- 


Absorbance  at  1080  cm'1  (a.u.) 

Fig.  7  Distribution  of  absorbance  for  individual  spots  and  all  pixels 
from  each  class,  normalized  by  the  total  number  of  pixels  in  the  class, 
demonstrates  that  the  examination  at  patient  level  and  at  a  global  level 
may  not  correspond 
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ently  provide  discrimination  when  none  exists  can  be 
equated  to  the  false  discovery  rate  (FDR)  [69]  of  metrics. 
The  FDR  is  very  different  from  the  /i-valuc  for  determining 
that  a  metric  separates  two  distributions;  a  much  higher 
FDR  can  be  tolerated  than  can  a  p-value.  Similarly,  a  false 
negative  rate  has  been  proposed  [70],  which  is  not  critical 
for  our  case  as  we  have  observed  high  accuracy  without  use 
of  any  erroneously  left  out  metrics.  While  detailed  cal¬ 
culations  and  their  underlying  concepts  are  too  lengthy  to 
reproduce  here,  for  the  sake  of  completeness,  it  suffices  to 
say  that  for  the  expected  number  of  metrics  demonstrating 
discrimination,  the  FDR  tends  to  zero  for  larger  than  ca. 
30  samples.  While  correlations  due  to  chance  can  be  min¬ 
imized  by  this  approach,  there  is  potential  for  unknown 
bias  or  error  in  prediction  for  small  numbers  of  samples. 
Flence  the  algorithm  must  be  integrated  with  sampling 
considerations. 

Multivariate  algorithms 

It  was  argued  in  the  previous  section  that  univariate 
analysis  may  not  provide  a  good  measure  of  the  population 
distribution.  It  can  alternatively  be  argued  that  the  individ¬ 
ual  differences  in  univariate  measures  are  masked  if 
population  measures  of  the  same  are  employed.  Similarly, 
multivariate  techniques  may  mask  the  individual  measures 
in  population  testing.  Flence,  our  philosophy  has  been  to 
employ  a  multivariate,  supervised  classification  in  which 
the  metrics  are  derived  from  univariate  analyses.  This 
enables  us  to  carefully  examine  each  metric  for  both 
population  as  well  as  individual  sample  relevance.  While 
unsupervised  clustering  approaches  provide  good  insight 
into  spectral  similarity,  a  supervised  method  forces  a 
relation  to  common  clinical  knowledge.  For  example,  as 
shown  in  Fig.  4  for  prostate  tissue,  we  consider  a  ten-class 
model  to  determine  histology.  The  drawback  is  that  the 
sensitivity  of  the  approach  to  individual  samples  is  lost  at 
the  expense  of  generality.  One  could  potentially  combine 
clustering  and  supervised  classification.  Clustering  infor¬ 
mation  on  the  training  data  set  would  emphasize  individual 
sample  distributions,  which  would  allow  for  supervised 
classification  tailored  to  each  cluster  type.  Such  an 
approach  has  not  been  implemented  yet  but  is  being 
attempted  in  our  laboratories  to  classify  samples  optimally. 

Dimensionality  reduction 

It  is  well  recognized  that  the  spectrum  at  each  pixel  needs 
to  be  reduced  to  a  smaller  set  of  useful  descriptors  that 
capture  the  essential  infonnation  inherent  in  the  spectrum. 
The  reduction  of  full  spectral  infonnation  to  essential 
measures  helps  eliminate  from  consideration  those  spectral 
features  that  have  no  information  (non-absorbing  frequen¬ 


cies),  little  biochemical  significance  (e.g.,  apparent  absorp¬ 
tion  at  non-chemically  specific  frequencies),  inconsistent 
measures  that  may  degrade  classification,  and  those  with 
redundant  information.  The  number  of  useful  measures  is 
significantly  smaller  than  the  number  spectral  resolution 
elements  and,  hence,  the  process  is  also  termed  dimension¬ 
ality  reduction.  Dimensionality  reduction  and  further 
refinement  (vide  infra)  also  helps  reduce  the  incidence  of 
prediction  by  chance  alone,  reduce  computation  time  and 
storage  requirements.  Potential  measures  of  a  spectrum’s 
useful  features  are  termed  metrics  and  are  defined  manually 
in  our  scheme. 

It  may  be  argued  that  the  metrics  are  not  selected  in  an 
objective  manner  due  to  a  human  performing  this  task  and 
some  computer  routines  must  be  employed.  While  the  use 
of  an  automated  computer  program  is  most  certainly 
objective  and  reproducible,  the  algorithm  that  drives  such 
programs  is  generated  from  spectroscopy  knowledge.  A 
well-trained  spectroscopist  can  recognize  spectral  features 
and  assign  them  to  appropriate  their  biochemical  basis. 
While  a  computer  algorithm  may  be  able  to  enhance  subtle 
features  in  the  spectrum,  automated  peak-picking  algo¬ 
rithms  run  the  risk  of  substantial  error  as  they  are  based  on 
some  very  specific  criteria  that  may  not  be  universally 
valid.  We  believe  that  computer  algorithms  are  more  suited 
to  finding  correlations  and  patterns  that  a  human  cannot  for 
the  sheer  size  and  complexity  of  data.  Hence,  the  process  of 
determining  which  spectral  features  to  consider  is  entirely 
manual  in  our  approach.  It  must  be  emphasized  that  the 
universal  set  of  metrics  is  selected  manually  but  that  the 
data  reduction  step  to  a  set  of  metrics  to  be  used  in 
algorithms  is  entirely  based  on  objective  algorithms. 
Manual  refinement  of  metrics  for  classification  is,  obvious¬ 
ly,  not  recommended  for  possibilities  of  overlooking 
specific  features,  biasing  the  selection  to  specific  feature 
sets,  or  in  determining  the  optimal  set  of  metrics  for  a 
classifier.  Dimensionality  reduction  is  also  intimately  linked 
to  the  data  quality  and  classification  algorithm  employed. 

Classification  algorithm 

A  number  of  supervised  algorithms  have  been  applied  to 
dimensionally  reduced  data,  including  those  based  on  linear 
discriminant  analysis,  neural  networks,  decision  trees,  and 
modified  Bayesian  Classifiers.  An  intermediate  step  in 
some  of  these  algorithms  provides  for  a  fuzzy  result  in 
which  every  pixel  has  a  probability  of  belonging  to  every 
class.  For  example,  in  our  approach,  each  pixel  can  have  a 
probability  (between  zero  and  one)  of  belonging  to  each 
class.  A  discriminant  function  then  assigns  each  pixel  to  a 
class  based  on  a  decision  rule.  The  pre-discriminant  data 
set,  termed  rule  imaging  set,  contains  important  informa¬ 
tion.  In  our  algorithm,  it  is  a  direct  measure  of  the 
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probability  of  the  pixel  belonging  to  the  class.  Hence,  the 
probability  value  may  be  used  to  compare  the  potential  of 
two  protocols  to  distinguish  a  cell  type  or  to  quantify 
confidence  in  results  for  tissue  classified  by  different 
methods. 

Measures  of  accuracy  and  optimization 

We  prefer  the  use  of  the  AUC  for  both  optimizing 
algorithms  and  for  validating  results.  Confidence  in  the 
value  of  the  AUC,  hence,  is  the  primary  test  for  the  valid¬ 
ity  of  developed  algorithms  and  is  characterized  by  the 
standard  error  of  the  value.  For  example,  in  validating  the 
discrimination  of  epithelial  from  stromal  pixels  in  a  blinded 
validation  set,  the  cumulative  distribution  of  AUC  in  a 
TMA  is  shown  in  Fig.  8.  More  than  20%  of  the  spots  had 
an  AUC  >95%  and  no  AUC  value  below  0.8  was  recorded. 
One  drawback  of  using  ROC  curves  and  AUC  values  is  that 
the  results  are  valid  for  one  at  a  time  classification.  Hence, 
we  have  analyzed  here  the  segmentation  of  epithelium  from 
all  other  cell  types.  The  tissue  is  classified  into  ten  classes 
as  before  but  the  results  are  lumped  into  epithelial  and  non- 
epithelial  pixels.  Further,  not  all  TMA  cores  have  all  types 
of  cells.  Hence,  the  two-class  model  also  allows  us  to 
examine  a  large  number  of  samples.  Last,  we  excluded 
cores  that  did  not  contain  at  least  100  pixels  of  each  class  to 
leave  103  cores  for  the  analysis. 

Quantitative  measures  of  performance  and  accuracy  are 
perhaps  the  weakest  portion  of  reports  using  1R  spectros¬ 
copy  for  cancer  pathology.  Typically,  sensitivity  and  spec¬ 
ificity  have  been  employed  as  summary  measures.  While 
these  are  indeed  very  relevant,  we  demonstrate  that  they  are 
insufficient  and  classification  analysis  must  utilize  more 
measures  to  understand  the  process.  Specifically,  the  use  of 


Fig.  8  Distribution  of  AUC  values  in  a  TMA  for  discriminating 
epithelium  from  stroma  using  the  ten-class  model 


receiver  operating  characteristic  (ROC)  curves  [71]  is  an 
excellent  direction.  The  area  under  the  ROC  curve  is  a 
further  summary  measure  that  provides  both  a  quantitative 
understanding  of  the  discrimination  potential  of  the  model 
and  a  convenient  measure  to  compare  multiple  classifica¬ 
tion  models.  The  third  tool  we  introduced  was  the 
confusion  matrix.  While  ROC  curves  provide  the  potential 
for  correct  classification  of  a  binary  rule  at  a  time,  con¬ 
fusion  matrices  correspond  to  a  particular  point  on  the 
ROC  curve  under  the  constraints  of  accuracy  measures  of 
other  classes.  These  also  directly  correspond  to  the  final 
segmentation  of  the  rule  image  under  an  optimization 
condition.  The  optimization  condition  may  simply  be  the 
maximization  of  the  accuracy  or  may  be  the  minimization 
of  certain  types  of  errors. 

Discriminant  and  class  assignment 

In  a  multi-class  analysis,  our  approach  to  evaluating  ROC 
curves  for  a  class  is  one  at  a  time,  i.e.,  all  other  classes  are 
essentially  lumped  in  the  rule  data  and  the  highest 
probability  of  the  lumped  ensemble  is  compared  to  the 
class  whose  ROC  curve  is  being  built.  Hence,  the  AUC 
values  must  be  regarded  as  a  potential  for  classification. 
They  are  best  suited  to  answer  the  binary  question  of 
whether  a  pixel  is  correctly  identified  or  not  when 
considering  a  single  class.  This  method  is  ideally  suited  to 
a  cascaded  classifier  one  at  a  time.  Such  a  classifier  has  not 
been  reported  yet  but  would  provide  a  means  to  explicitly 
determine  the  error  for  any  given  classification  scheme. 


Experimental  parameters  and  classification 

Here,  we  take  advantage  of  the  trading  rules  of  FTIR 
spectroscopy  and  imaging  to  model  the  effects  of  the 
experimental  parameters  on  the  classification  process. 
While  the  signal  to  noise  ratio  (SNR)  and  resolution  are 
generally  arbitrarily  fixed  in  most  studies,  we  demonstrate 
their  importance  in  classification. 

Effect  of  signal  to  noise  ratio 

There  are  two  issues:  what  is  the  “best”  SNR  to  formulate 
algorithms  and  second,  provided  an  algorithm,  what  is  the 
least  SNR  that  would  provide  adequate  classification.  Only 
the  latter  issue  is  examined  here.  As  with  conventional 
FTIR  spectrometers,  imaging  spectrometers  obey  the 
trading  rules  of  1R  spectroscopy.  Hence,  if  an  «-fold 
reduction  in  SNR  provides  the  same  results,  data  acquisi¬ 
tion  will  be  «2-fold  faster.  Thus,  in  addition  to  an  interesting 
fundamental  behavior  of  the  classifier,  the  role  of  SNR  has 
a  direct  bearing  on  the  speed  at  which  data  is  acquired. 
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Fig.  9  a  Noise  in  the  data  set  as 
a  function  of  added  random 
noise,  b  Effect  of  spectral  noise 
on  the  accuracy  of  classification 
as  measured  by  AUC  values 


a 


b 


We  examined  classification  accuracy  as  a  function  of 
average  spectral  noise.  To  strictly  examine  the  effect  of 
noise,  data  must  be  acquired  at  different  co-added  spectral 
numbers.  The  time  required  for  imaging  an  array  multiple 
times,  however,  is  prohibitive.  Hence,  we  computationally 
added  random,  Gaussian  noise  to  the  original  spectral  data. 
Peak-to-peak  and  root  mean  square  (rms)  noise  were 
measured  in  the  1,950-  to  2,150-cm  1  region  adjacent  to 
the  amide  I  peak.2  Representative  single  pixel  spectra  from 
the  data  sets  are  shown,  as  a  function  of  noise,  in  Fig.  9a. 
We  additionally  plotted  the  observed  noise  levels  against 
the  added  noise  to  verify  linearity  (plot  not  shown).  The 
linear  relationship  conforms  to  the  expected  result  and 
provides  a  scaling  factor  to  express  the  equivalent  reduction 
in  data  acquisition  time  (co-addition)  that  would  be  realized 
at  that  noise  level.  For  example,  the  addition  of  0.005  a.u. 
of  noise  raises  the  peak-to-peak  noise  from  0.0013  to 
0.015  a.u.,  corresponding  to  a  decrease  in  data  acquisition 
time  by  a  factor  of  ca.  100  for  this  data  set.  In  addition  to 
increasing  noise,  we  employed  an  algorithm  based  on  an 
MNF  transform  [72,  73]  to  mathematically  eliminate  noise. 
The  observed  peak-to-peak  noise  was  0.00017  a.u., 
corresponding  to  an  increase  in  data  acquisition  time  by  a 
factor  greater  than  ca.  100.  Hence,  the  data  examined  span 
about  5  orders  in  magnitude  of  collection  time. 

The  average  height  of  the  amide  1  peak  was  0.42  a.u.  in 
all  cases,  providing  a  SNR  of  2,500  (MNF-corrected  data) 
to  1.5  for  the  data  sets.  Accuracy  as  a  function  of  the  noise 
level  is  shown  in  Fig.  9b.  While  the  x-error  bars  indicate  the 
standard  deviation  of  noise  levels  in  pixels,  the  y-error  bars 
indicate  the  standard  deviation  in  AUC  values  of  all  ten 


2  It  is  noteworthy  that  we  are  examining  trends  in  the  absorbance 
spectra.  Strictly,  SNR  should  be  measured  in  single  beam  spectra  to 
relate  rigorously  to  theory.  It  can  be  shown,  however,  that  the  trend 
will  hold  approximately  for  the  absorbance  spectra  as  well.  Many 
practitioners  advocate  the  use  of  nns  SNR.  We  are  employing  peak-to- 
peak  fluctuations  over  the  same  spectral  range.  Hence,  the  noise 
values  we  obtain  will  be  higher  but  will  follow  the  same  trend. 


classes.  As  a  general  rule,  the  classification  improves  with 
lower  noise  levels.  We  first  note  that  the  classification  does 
not  become  perfect  for  any  noise  level  and  there  is  a 
significantly  diminishing  return  in  increasing  the  SNR 
beyond  a  level.  At  the  other  end,  the  ability  to  distinguish 
classes  is  entirely  lost  at  levels  of  ca.  0.1.  Performance 
across  multiple  data  sets  observed  using  our  prediction 
model  indicates  that  the  increases  demonstrated  at  noise 
levels  lower  than  ca.  0.003  a.u.  are  within  the  variance. 
Hence,  there  is  little  benefit  to  decreasing  the  noise  levels 
below  ca.  0.003  a.u.  for  this  data  set,  or  to  increasing  the 
SNR  beyond  ca.  150.  It  must  be  emphasized  that  the  model, 
prediction  algorithm,  and  discriminant  function  are  inti¬ 
mately  linked  in  a  non-linear  manner.  While  this  makes  it 
impossible  to  predict  the  behavior  generally  of  all  classifi¬ 
cation  approaches,  this  simple  exercise  may  be  conducted 
to  determine  the  optimal  data  acquisition  parameters.  For 
our  selected  metrics  and  model,  it  appears  that  the  data 
acquisition  time  can  be  decreased  by  a  factor  of  ca.  3 
without  significant  degradation  in  accuracy. 

Spectral  resolution 

We  next  examined  the  effect  of  spectral  resolution  on  the 
results  that  would  be  obtained  using  the  developed 
algorithm.  As  in  the  previous  section,  the  data  were  not 
re-acquired  but  were  downsampled  from  acquired  data 
using  a  neighbor  binning  procedure.  Spectra  from  the  same 
epithelial  class  pixel,  at  different  resolutions  (Fig.  10a), 
demonstrate  the  effect  of  downsampling  on  feature  defini¬ 
tion.  Figure  10b  demonstrates,  first,  that  the  peak-to-peak 
noise  levels  over  the  region  remain  the  same  with  spectral 
resolution.  As  previously  observed,  noise  is  an  important 
control  in  comparing  spectra;  the  peak-to-peak  noise  over 
the  same  number  of  data  points  was  preserved  by  neighbor 
binning.  In  practice,  the  constant-throughput  spectrometer 
would  provide  a  SNR  (or  noise  level,  in  this  case)  that 
decreases  linearly  with  resolution.  Since  most  array 
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Fig.  10  a  Spectra  obtained  by 
downsampling  acquired  data  to 
different  resolutions  using  a 
neighbor  binning  procedure.  d 

The  inset  demonstrates  the  effect  ,®. 
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detectors  can  be  operated  with  higher  integration  times,  it  is 
fair  to  assume  that  the  time  advantage  in  decreasing 
resolution  would  be  linear.  Second,  the  performance  of  the 
classifier  is  very  nearly  the  same  for  finer  spectral  resolutions 
and  degrades  only  significantly  for  32  cnT1.  While  the 
results  may  appear  to  be  surprising,  a  closer  analysis  of  the 
basis  of  the  algorithms  provides  insight  into  the  trends. 

The  classifier  is  based  on  absorbance  and  center  of 
gravity  measures  of  the  peaks.  It  is  well  established  that 
absorbance  is  measured  accurately,  provided  that  the 
FWHH  of  the  peak  is  not  significantly  smaller  than  the 
resolution.  The  Ramsay  resolution  parameter,  cr,  is  a  useful 
measure  that  was  originally  developed  for  monochromators 
but  has  been  shown  to  be  applicable  to  FTIR  spectrometers 
as  well  [74],  While  most  bands  are  broad  and  peak 
absorbance  lower  than  ca.  0.7,  absorbance  values  are  not 
expected  to  be  adversely  impacted  from  the  measurement 
process.  With  decreasing  resolution,  however,  broadening 
within  complex  peaks  shapes  may  lead  to  observed  changes 
in  the  apparent  absorption  at  a  specific  wavenumber.  The 
change  itself  may  not  have  a  significant  influence  on  the 
classifier  performance  as  it  depends  on  several  such 
metrics.  A  second  type  of  metric  calculates  the  area  under 
the  curve.  This  is  not  expected  to  be  impacted  significantly 
for  most  peaks.  The  third  type  of  metric  we  have  used  is  the 
center  of  gravity  of  a  spectral  region.  While  spectral 
analyses  ordinarily  attempt  to  locate  the  peak  position  and 
use  it  as  a  metric,  we  chose  the  center  of  gravity  for  its 
sensitivity  to  both  position  and  asymmetrical  shape  changes 
in  complex  spectral  envelopes  observed  in  biological 
samples.  Since  the  classifier  is  based  on  center  of  gravity 
of  a  feature  and  not  on  the  wavenumber  of  the  peak 
maximum,  it  is  a  very  robust  measure  that  is  relatively 
unaffected  by  spectral  resolution  or  noise. 

Generalization  of  developed  algorithms  to  instruments 
and  practical  approaches 

The  characterization  of  classification  with  regard  to 
spectrometer  performance  (SNR)  and  spectral  resolution 


provides  information  to  optimize  parameters  on  one  spec¬ 
trometer.  It  is  unclear,  however,  if  the  calibration  would 
transfer  to  another  spectrometer.  We  contend  that  the 
potential  for  a  successful  transfer  is  high  as  the  classifica¬ 
tion  process  is  relatively  insensitive  to  resolution,  implying 
that  it  would  only  be  weakly  sensitive  to  apodization  or  to 
small  inaccuracies  in  wavelength  scale.  Similarly,  if  the 
SNR  of  acquired  data  is  used  as  control,  perturbations  due 
to  fixed  pattern  noise  in  focal  plane  array  detectors  or  the 
different  use  of  electronic  filters  by  different  manufacturers 
is  likely  to  be  insignificant  in  classifying  tissue  correctly. 
Various  instrument  manufacturers  also  set  the  nominal 
optical  resolution  differently  in  their  instruments.  The  issue 
of  spatial  resolution,  of  course,  is  more  complex.  Never¬ 
theless,  any  resolution  setting  around  the  wavelength- 
limited  case  will  likely  provide  consistent  results.  To  our 
knowledge,  there  has  been  no  comparison  yet  of  classifier 
performance  across  mid-IR  FTIR  imaging  spectrometers 
using  algorithms  developed  on  one  specific  instrument.  The 
developed  protocol  provides  for  such  a  framework  and 
detailed  results  are  awaited  from  on-going  work  [75]. 

Outlook  and  prospects 

An  exciting  period  in  imaging  tissues  spectroscopically 
with  low  power,  optical  microscopy-comparable  resolution 
is  emerging.  Considerable  work,  however,  needs  to  be 
accomplished  before  this  idea  can  become  a  clinical  reality. 
An  ultimate  goal  of  such  studies  is  to  provide  a  key 
technology  for  emerging  molecular  pathology.  The  ap¬ 
proach  promises  greatly  reduced  error  rates,  automation, 
and  economic  benefits  in  current  pathology  practice.  Look¬ 
ing  to  the  future,  chemical  imaging  approaches  will  be 
employed  for  diagnosing  cancers  in  pre-malignant  stages 
prior  to  their  apparent  changes  observable  by  conventional 
means,  predicting  the  prognosis  of  the  lesion  and  intra¬ 
operative  imaging  in  real-time.  Fundamental  studies  in  drug 
discovery  and  mechanisms  of  molecular  interactions  are 
further  examples  that  would  be  enabled  by  progress  in  this 
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area.  Doubtless,  exciting  applications  lie  ahead  and  prog¬ 
ress  is  rapidly  being  made  towards  practical  applications 
but  much  work  needs  to  be  done  to  carefully  apply  this 
powerful  technology  to  multiple  aspects  of  pathology. 
Success  in  this  endeavor  promises  to  change  the  practice 
of  pathology  radically  and  alter  the  clinical  management  of 
cancer  patients. 
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Abstract  Prostate  cancer  accounts  for  one-third  of  noncutaneous  cancers  diagnosed  in  US 
men  and  is  a  leading  cause  of  cancer-related  death.  Advances  in  Fourier  transform  infrared 
spectroscopic  imaging  now  provide  very  large  data  sets  describing  both  the  structural  and 
local  chemical  properties  of  cells  within  prostate  tissue.  Uniting  spectroscopic  imaging  data 
and  computer-aided  diagnoses  (CADx),  our  long  term  goal  is  to  provide  a  new  approach  to 
pathology  by  automating  the  recognition  of  cancer  in  complex  tissue.  The  first  step  toward  the 
creation  of  such  CADx  tools  requires  mechanisms  for  automatically  learning  to  classify  tissue 
types — a  key  step  on  the  diagnosis  process.  Here  we  demonstrate  that  genetics-based  machine 
learning  (GBML)  can  be  used  to  approach  such  a  problem.  However,  to  efficiently  analyze 
this  problem  there  is  a  need  to  develop  efficient  and  scalable  GBML  implementations  that  are 
able  to  process  very  large  data  sets.  In  this  paper,  we  propose  and  validate  an  efficient  GBML 
technique — NAX — based  on  an  incremental  genetics-based  rule  learner.  NAX  exploits  mas¬ 
sive  parallelisms  via  the  message  passing  interface  (MPI)  and  efficient  rule-matching  using 
hardware-implemented  operations.  Results  demonstrate  that  NAX  is  capable  of  performing 
prostate  tissue  classification  efficiently,  making  a  compelling  case  for  using  GBML 
implementations  as  efficient  and  powerful  tools  for  biomedical  image  processing. 
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1  Introduction 

Pathologist  opinion  of  structures  in  stained  tissue  is  the  definitive  diagnosis  for  almost  all 
cancers  and  provides  critical  input  for  therapy.  In  particular,  prostate  cancer  accounts  for 
one-third  of  noncutaneous  cancers  diagnosed  in  US  men.  Hence,  it  is,  appropriately,  the 
subject  of  heightened  public  awareness  and  widespread  screening.  If  prostate-specific 
antigen  (PSA)  or  digital  rectal  screens  are  abnormal,  a  biopsy  is  needed  to  definitively 
detect  or  rule  out  cancer.  Pathologic  status  of  biopsied  tissue  not  only  forms  the  definitive 
diagnosis  but  constitutes  an  important  cornerstone  of  therapy  and  prognosis.  There  is, 
however,  a  need  to  add  useful  information  to  diagnoses  and  to  introduce  new  technologies 
that  allow  economical  cancer  detection  to  focus  limited  healthcare  resources.  In  pathology 
practice,  widespread  screening  results  in  a  large  workload  of  biopsied  men,  in  turn,  placing 
a  increasing  demand  on  services.  Operator  fatigue  is  well  documented  and  guidelines  limit 
the  workload  and  rate  of  examination  of  samples  by  a  single  operator.  Importantly,  newly 
detected  cancers  are  increasingly  moderate  grade  tumors  in  which  pathologist  opinion 
variation  complicates  decision-making. 

For  the  reasons  above,  there  is  an  urgent  need  for  automated  and  objective  pathology 
tools.  We  have  sought  to  address  these  requirements  through  novel  Fourier  transform 
infrared  (FTIR)  spectroscopy-based,  computer-aided  diagnoses  for  prostate  cancer  and 
develop  the  required  microscopy  and  software  tools  to  enable  its  application.  FTIR 
spectroscopic  imaging  is  a  new  technique  that  combines  the  spatial  specificity  of  optical 
microscopy  and  the  biochemical  content  of  spectroscopy.  As  opposed  to  thermal  infrared 
imaging,  FTIR  imaging  measures  the  absorption  properties  of  tissue  through  a  spectrum 
consisting  of  (typically)  1024-2048  wavelength  elements  per  pixel.  Since  IR  spectra  reflect 
the  molecular  composition  of  the  tissue,  image  contrast  arises  from  differences  in 
endogenous  chemical  species.  As  opposed  to  visible  microscopy  of  stained  tissue  that 
requires  a  human  eye  to  detect  changes,  numerical  computation  is  required  to  extract 
information  from  IR  spectra  of  unstained  tissue.  Extracted  information,  based  on  a  com¬ 
puter  algorithm,  is  inherently  objective  and  automated  (Lattouf  and  Saad  2002;  Fernandez 
et  al.  2005;  Levin  and  Bhargava  2005;  Bhargava  et  al.  2006). 

Uniting  spectroscopic  imaging  data  and  computer-aided  diagnoses  (CADx),  we  seek  to 
provide  a  new  approach  to  pathology  by  automating  the  recognition  of  cancer  in  complex 
tissue.  This  is  an  exciting  paradigm  in  which  disease  diagnoses  are  objective  and  repro¬ 
ducible;  yet  do  not  require  any  specialized  reagents  or  human  intervention.  The  first  step 
toward  the  creation  of  such  CADx  tools  requires  mechanisms  for  reliable  and  automated 
tissue  type  classification.  In  this  paper  we  demonstrate  how  genetics-based  machine 
learning  tools  can  achieve  such  a  goal.  Interpretability  of  the  learned  models  and  efficient 
processing  of  very  large  data  sets  have  lead  us  to  rule-based  models — easy  to  interpret — 
and  genetics-based  machine  learning — inherent  massively  parallel  methods  with  the 
required  scalability  properties  to  address  very  large  data  sets.  We  present  the  method  and 
the  efficiency  enhancement  techniques  proposed  to  address  automated  tissues  classifica¬ 
tion.  When  pushed  beyond  the  relatively  small  problems  traditionally  used  to  test  such 
methods,  an  need  for  efficient  and  scalable  implementations  becomes  a  key  research  topic 
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that  needs  to  be  addressed.  We  designed  the  proposed  a  technique  with  such  constraints  in 
mind.  A  modified  version  of  an  incremental  genetics-based  rule  learner  that  exploits 
massive  parallelisms — via  the  message  passing  interface  (MPI) — and  efficient  rule¬ 
matching  using  hardware-oriented  operations.  We  name  this  system  NAX.  NAX  is  compared 
to  traditional  and  genetics-based  machine  learning  techniques  on  an  array  of  publicly 
available  data  sets.  We  also  report  the  initial  results  achieved  using  the  proposed  technique 
when  classifying  prostate  tissue. 

The  remainder  of  the  paper  is  structured  as  follows.  We  present  an  overview  of  the 
problem  addressed  in  Sect.  2,  paying  special  attention  to  tissue  classification.  We  discuss  in 
Sect.  3  the  hurdles  that  traditional  genetics-based  machine  learning  implementations  face 
when  applied  to  very  large  data  sets.  Section  4  presents  our  solution  to  those  hurdles.  We 
also  describe  the  incremental  rule  learner  proposed  for  tissue  classification.  Last,  we 
summarize  results  on  publicly-available  data  sets  and  the  preliminary  results  for  tissue 
classification  on  a  prostate  tissue  microarray  in  Sect.  5.  Finally,  in  Sect.  6,  we  present 
conclusions  and  further  work. 


2  Biomedical  imaging  and  data  mining 

This  section  presents  an  overview  of  the  problem  addressed  in  this  paper.  We  first  intro¬ 
duce  infrared  spectroscopic  imaging  as  a  potentially  powerful  tool  for  cancer  diagnosis  and 
prognosis.  Then,  we  explore  the  protocols  that  provide  raw  high-quality  data  that  for  data 
mining.  Finally,  we  conclude  by  focusing  on  the  key  task,  tissue  classification,  by  focusing 
on  prostate  tissue. 


2. 1  Infrared  spectroscopy  and  imaging  for  cancer  diagnosis  and  prognosis 

Infrared  spectroscopy  is  a  well-established  molecular  technique  and  is  widely  used  in 
chemical  analyses.  The  fundamental  principle  governing  the  response  of  any  material  is 
that  the  vibrational  modes  of  molecules  are  resonant  in  energy  with  photons  in  the  mid- 
infrared  region  (2-14  mm)  of  the  electromagnetic  spectrum.  Hence,  when  photons  of 
energy  that  are  resonant  with  the  material’s  molecular  composition  are  incident,  a  number 
are  absorbed.  The  number  absorbed  is  directly  proportion  to  the  number  of  chemical 
species  that  are  excited.  Hence,  any  material  has  a  characteristic  frequency-dependent 
absorption  profile  called  a  spectrum.  An  infrared  spectrum  is  often  termed  the  “optical 
fingerprint”  of  a  material  as  it  can  help  uniquely  identify  molecular  composition — see 
Fig.  1. 

Researchers,  including  us,  have  contributed  to  develop  an  imaging  version  of  spec¬ 
troscopy  that  is  essentially  similar  to  an  optical  microscope.  In  this  mode  of  spectroscopy, 
images  are  acquired  in  the  manner  of  optical  microscopy  with  one  important  difference. 
Instead  of  measuring  the  intensity  of  three  colors  for  a  visible  image,  several  thousand 
intensity  values  are  acquired  at  each  pixel  in  the  image  as  a  function  of  wavelength 
(spectrum  at  each  pixel).  The  resulting  data  set  is  three  dimensional  (2  spatial  and  1 
spectral  indices)  consisting  typically  of  a  size  256  X  256  x  1024,  but  extending  to  sizes 
such  as  3500  X  3500  x  2048.  Since  each  data  point  is  stored  as  a  16-bit  number,  the 
data  size  typically  runs  into  several  tens  to  hundreds  of  gigabytes. 
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Wavenumber  (cm-1) 

Fig.  1  Conventional  staining  and  automated  recognition  by  chemical  imaging.  (A)  Typical  H&E  stained 
sample,  in  which  structures  are  deduced  from  experience  by  a  human.  Highlights  of  specific  regions  in  the 
manner  of  H&E  is  possible  using  FTIR  imaging  without  stains.  (B)  Absorption  at  1080  cm-1  commonly 
attributed  to  nucleic  acids  and  (C)  to  proteins  of  the  stroma.  The  data  obtained  is  3  dimensional  (D)  from 
which  spectra  (E)  or  images  at  specific  spectral  features  may  be  plotted 


2.2  Mining  the  spectra:  Two  sequential  problems 

Though  the  continued  development  of  fast  FTIR  microspectroscopy  represents  an  exciting 
opportunity  for  pathology,  handling  the  resultant  data  and  rapidly  providing  classifications 
remains  a  critical  challenge.  First,  the  sheer  volume  of  data — potentially  larger  than  10  GB 
a  day — represents  an  organizational  and  retrieval  challenge.  Next,  extraction  of  useful 
information  in  short  time  periods  requires  the  formulation  of  optimal  protocols.  Third,  the 
automated  cancer  segmentation  problem  is  very  complex  and  offers  a  number  of  routes  and 
levels  of  data  that  need  to  be  analyzed  to  determine  the  optimal  approach  for  application  in 
a  laboratory. 

The  typical  application  is  the  need  to  extract  information  from  the  data  set  such  that  it  is 
clinically  relevant.  Hence,  the  output  of  the  data  mining  algorithm  to  be  developed  is  well- 
bounded  and  clearly  defined.  For  example,  in  the  prostate  there  are  two  levels  of  interest.  In 
the  first  level,  the  pathologist  examines  the  tissue  to  determine  if  there  are  any  epithelial 
cells.  Since  more  than  95%  of  prostate  cancers  arise  in  epithelial  cells,  transformations  in 
this  class  of  cells  forms  the  diagnostic  basis  and  a  primary  determinant  of  therapy.  Other 
cell  types  of  interest  are  lymphocytes  that  may  indicate  inflammation,  blood  vessel  density 
that  may  indicate  the  development  of  new  blood  supply  indicative  of  cancer  growth  and 
nerves  that  may  be  invaded  by  cancer  cells.  Hence,  any  automated  approach  to  pathology 
must  first  identify  cell  types  accurately.  The  second  step  in  pathology  follows.  Once 
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epithelial  cells  are  located,  their  spatial  patterns  are  indicative  of  disease  states.  In  our 
imaging  approach,  we  can  identify  both  spatial  patterns  as  well  as  chemical  patterns  in 
epithelial  cells.  Hence,  the  task  would  be  to  use  either  or  both  to  classify  disease.  In  this 
paper,  we  focus  only  on  the  accurate  identification/classification  of  tissue  types  as  the  first 
step  of  the  path  that  leads  to  obtaining  the  correct  pixels  of  epithelium. 


2.3  Tissue  classification  for  prostate  arrays 

Prostate  tissue  is  structurally  complex,  consisting  primarily  of  glandular  ducts  lined  by 
epithelial  cells  and  supported  by  heterogeneous  stroma.  This  tissue  also  contains  blood 
vessels,  blood,  nerves,  ganglion  cells,  lymphocytes  and  stones  (which  are  comprised  of 
luminal  secretions  of  cellular  debris)  that  organize  into  structure  measuring  from  tens  to 
hundreds  of  microns.  These  structures  are  readily  observable  within  stained  tissue  using 
bright-held  microscopy  at  low  to  medium  magnifications.  Hence,  in  applying  FTIR 
imaging  (Levin  and  Bhargava  2005),  we  obtain  the  common  structural  detail  employed 
clinically  and,  additionally,  spectral  information  indicative  of  tissue  biochemistry.  As 
histologic  classes  contain  identical  chemical  components,  infrared  vibrational  spectra  are 
similar  but  reveal  small  differences  in  specific  absorbance  features.  The  technique  pro¬ 
posed  by  Fernandez  et  al.  (2005)  examines  each  cell  types’  spectra  and  transforms  each 
spectrum  into  a  vector  of  describing  features — usually  around  the  hundreds.  A  complete 
description  of  this  process  is  beyond  the  scope  of  this  paper  and  can  be  found  elsewhere 
(Fernandez  et  al.  2005).  Each  pixel  (cell  present  in  the  slice  of  micro  array  under  analysis) 
has  an  assigned  spatial  position  in  the  array  while  the  tissue  type  is  assigned  by  a  highly 
experienced  pathologist.  Thus,  the  tissue  classification  can  be  cast  into  a  supervised 
classification  problem  (Mitchell  1997),  where  all  the  attributes  are  real-valued  and  the  class 
is  the  tissue  type — ten  classes:  ephithelium,  fibrous  stroma ,  mixed  stroma,  muscle,  stone, 
lymphocytes,  endothelium,  nerve,  ganglion,  and  blood.  Figure  2  presents  tissue  types  that 
can  be  assigned  by  examining  a  stained  image  obtained,  after  the  FTIR  microsprectroscopy 
on  unstained  tissue, by  the  pathologist.  Each  marked  pixel  in  the  image  becomes  a  train¬ 
ing  example;  hence,  the  usual  smallest  data  set  is  around  hundreds  of  thousand  records 
per  array. 


3  Larger,  bigger,  and  faster  genetics-based  machine  learning 

Bernado  et  al.  (2001)  presented  a  first  empirical  comparison  between  genetics-based 
machine  learning  techniques  (GBML)  and  traditional  machine  learning  approached.  The 
authors  reported  that  GBML  techniques  were  as  competent  as  traditional  techniques.  Later, 
Bacardit  and  Butz  (2006)  repeated  the  analysis,  obtaining  similar  results.  Most  of  the 
experiments  presented  on  both  papers  used  publicly  available  data  sets  provided  by  the 
University  of  California  at  Irvine  repository  (Merz  and  Murphy  1998).  Most  of  the  data 
sets  are  defined  over  tens  of  features  and  up  to  few  thousands  of  records — in  the  larger 
cases.  However,  a  key  property  of  GBML  approaches  is  its  intrinsic  massive  parallelism 
and  scalability  properties.  Cantu-Paz  (2000)  presented  how  efficient  and  accurate  genetics 
algorithms  could  be  assembled,  and  Llora  (2002)  presented  how  such  algorithms  can  be 
efficiently  used  for  machine  learning  and  data  mining.  However,  there  are  elements  that 
need  to  be  revisited  when  we  want  to  efficiently  apply  GBML  techniques  to  large  data  sets 
such  as  the  one  described  in  the  previous  section. 


■£)  Springer 


X.  Llora  et  al. 


\ 

•  r 

it-' 

0 

''\S' 

** 

# 

l 

V 

1/ 

% 

ft 

xw 

VI 

fa 

Ur 

"-V 

'-*5* 

v, 

V  V 

ft 

$ 

%  ^ 

A. 

* 

Co  * 

# 

fin 

*fe  ft?  ^  s 
9 


' 

J*1  '4 

it 


Fig.  2  The  figure  presents  the  tissue  labeling  provided  by  a  pathologist  biopsy  section  of  human  prostate 
tissue.  Each  spot  represents  the  section  of  a  needle.  Different  colors  represent  different  tissue  types 


The  GBML  techniques  require  evaluating  candidate  solutions  against  the  original  data 
set  matching  the  candidate  solutions  (e.g.,  rules,  decision  trees,  prototypes)  against  all 
the  instances  in  the  data  set.  Regardless  of  the  flavor  used,  Llora  and  Sastry  (2006) 
showed  that,  as  the  problem  grows,  rule  matching  governs  the  execution  time.  For  small 
data  sets  (teens  of  attributes  and  few  thousands  of  records)  the  matching  process  takes 
more  than  85%  of  the  overall  execution  time  marginalizing  the  contribution  of  the  other 
genetic  operators.  This  number  increases  to  98%  and  above,  when  we  move  to  data  sets 
with  few  hundreds  of  attributes  and  few  hundred  thousands  of  records.  More  than  98% 
of  the  time  is  spent  evaluating  candidate  solutions.  Each  evaluation  can  be  computed  in 
parallel.  Moreover,  the  evaluation  process  may  also  be  parallelized  on  very  large  data 
sets  by  splitting  and  distributing  the  data  across  the  computational  resources.  A  detailed 
description  of  the  parallelization  alternatives  of  GBML  techniques  can  be  found  else¬ 
where  (Llora  2002). 

Currently  available  off-the-shelf  GBML  methods  and  software  distributions  (Barry 
and  Drugow-itsch  1997;  Llora  2006)  do  not  usually  target  large  data  sets.  The  two  main 
bottlenecks  are  large  memory  footprints  and  sequential-processing  oriented  processes. 
Generally  speaking,  they  were  designed  to  run  on  single  processor  machines  with 
enough  memory  to  lit  the  entire  data  set.  Hence,  designers  did  not  paying  much 
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attention  to  the  memory  footprint  required  to  store  the  data  set — usually  completely 
loaded  into  memory  and  the  population  of  candidate  solutions.  These  large  complex 
structures  were  geared  to  facilitate  the  programming  effort,  but  they  are  not  designed 
toward  the  efficient  evaluation  of  the  candidate  solutions.  However,  efforts  have  been 
made  to  push  GBML  methods  into  domains  which  require  processing  large  data  sets. 
Three  different  works  need  to  be  mentioned  here.  Flockhart  (1995)  proposed  and 
implemented  GA-MINER,  one  of  the  earliest  effort  to  create  data  mining  systems  based 
on  GBML  systems  that  scale  across  symmetric  multi-processors  and  massively  parallel 
multi-processors.  Flockhart  (1995)  reviewed  different  encoding  and  parallelization 
schemes  and  conducted  proper  scalability  studies.  Llora  (2002)  explored  how  fine¬ 
grained  parallel  genetic  algorithms  could  become  efficient  models  for  data  mining. 
Theoretical  analysis  of  performance  and  scalability  were  developed  and  validated  with 
proper  simulations.  Recently,  Llora  and  Sastry  (2006)  explored  how  current  hardware 
can  efficiently  speed  up  rule  matching  against  large  data  sets.  These  three  approaches 
are  the  basis  of  the  incremental  rule  learning  proposed  in  the  next  section  to  approach 
very  large  data  sets. 

Another  important  issue  in  real-world  problems  is  the  class  distribution.  Usually 
most  real  problems  have  a  clear  class  imbalance.  Recently,  Omols-Puig  and  Bernado- 
Mansilla  (2006)  have  revisited  this  issue,  showing  how  GBML  techniques  successfully 
learn  and  maintain  proper  descriptions  for  those  minority  classes.  If  not  designed 
properly,  descriptions  of  majority  classes  will  tend  to  govern  the  learned  models, 
starving  the  description  of  minority  classes.  Prostate  tissue  classification  is  a  clear 
example  of  extreme  class  imbalance.  Figure  3  presents  the  tissue  type  class  distribution. 
The  smaller  tissue  type  has  64  records,  where  as  the  larger  classes  have  several  tens  of 
thousands  records,  hence,  the  developed  approaches  must  account  for  class  size 
variation. 


Tissue  type  index  after  count  sorting 


Fig.  3  Figure  shows  the  tissue  class  distribution.  Once  the  classes  are  reordered  according  to  their 
frequency  in  the  data  set,  we  can  easily  appreciate  the  extreme  imbalance — the  smaller  tissue  type  has  64 
records,  where  as  the  larger  classes  have  several  tens  of  thousands  records 
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4  The  road  to  tractability 

We  describe  in  this  section  the  steps  we  took  to  design  a  GBML  method  (NAX)  able  to  deal 
with  very  large  data  sets  with  class  imbalance.  NAX  evolves,  one  at  a  time,  maximally 
general  and  maximally  accurate  rules.  Then,  the  covered  instance  are  removed  and  another 
maximally  general  and  maximally  general  rule  is  evolved  and  added  to  the  previously 
stored  one  forming  a  decision  list.  This  process  continues  until  no  uncovered  instances  are 
left — this  process  is  also  referred  as  the  sequential  covering  procedure  (Cordon  et  al. 
2001).  Llora  et  al.  (2005)  showed  that  maximally  general  and  maximally  accurate  rules 
(Wilson  1995)  could  also  be  evolved  using  Pittsburgh-style  Learning  Classifier  Systems. 
Later,  Llora  et  al.  (2007)  showed  that  competent  genetic  algorithms  (Goldberg  2002) 
evolve  such  rules  quickly,  reliably,  and  accurately.  The  rest  of  this  section  describes  (1) 
efficient  implementation  techniques  to  deal  with  very  large  data  sets,  (2)  the  impact  of  class 
imbalance,  and  (3)  the  NAX  algorithm  proposed. 


4.1  Efficient  implementations 

As  introduced  earlier,  when  dealing  with  very  large  data  sets,  and  regardless  of  the  flavor 
of  the  GBML  technique  used,  we  may  spend  up  to  98%  of  the  computational  cycles  trying 
to  match  rules  to  the  original  data  set  (Llora  and  Sastry  2006).  Each  solution  evaluation  is 
independent  of  each  other  and,  hence,  it  can  be  computed  in  parallel.  Moreover,  even  the 
matching  nature  of  a  rule — the  representation  we  will  use  from  now  on — is  highly  parallel, 
since  conditions  require  performing  simultaneous  checks  against  different  attributes  per 
record.  Thus,  efficient  implementation  can  take  advantage  of  parallelizing  both  elements. 


4.1.1  Exploiting  the  hardware  acceleration 

Recently,  multimedia  and  scientific  applications  have  pushed  CPU  manufactures  to  include 
support  for  vector  instructions  again  in  their  processors.  Both  applications  areas  require 
heavy  calculations  based  on  vector  arithmetic.  Simple  vector  operations  such  as  add  or 
product  are  repeated  over  and  over.  During  1980s  and  1990s  supercomputers,  such  as  Cray 
machines,  were  able  to  issue  hardware  instructions  that  enabled  basic  vector  arithmetics.  A 
more  constrained  scheme,  however,  has  made  its  way  into  general-purpose  processors 
thanks  to  the  push  of  multimedia  and  scientific  applications.  Main  chip  manufactures — 
IBM,  Intel,  and  AMD — have  introduced  vector  instruction  sets — Altivec,  SSE3,  and 
3DNow+ — that  allow  vector  operations  over  packs  of  128  bits  by  hardware.  We  will  focus 
on  a  subset  of  instructions  that  are  able  to  deal  with  floating  point  vectors.  This  subset  of 
instructions  manipulate  groups  of  four  floating-point  numbers.  These  instructions  are  the 
basis  of  the  fast  rule  matching  mechanism  proposed. 

Our  goal  is  to  evolve  a  set  of  rules  that  correctly  classifies  the  current  data  set  rom 
prostate  tissue.  Using  a  knowledge  representation  based  on  rules  allows  us  to  inspect  the 
learned  model,  gaining  insight  into  the  biological  problem  as  well.  All  the  attributes  of  the 
domain  are  real-value  and  the  conditions  of  the  rules  need  to  be  able  to  express  conditions 
in  a  5ft"  spaces.  We  use  a  similar  rule  encoding  to  the  one  proposed  by  Wilson  (2000b) — a 
variation  of  the  original  work  proposed  by  Wilson  (2000a)  and  later  reviewed  by  Stone  and 
Bull  (2003) — and  widely  used  in  the  GBML  community.  Rules  express  the  conjunction  of 
tests  across  attributes.  Each  test  may  be  defined  in  multiple  flavors  but,  without  loss  of 
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generality,  we  picked  a  simple  interval  based  one.  A  simple  example  of  an  if-then  rule, 
could  be  expressed  as  follows: 

1.0  <  ao  <  2.3  A  •  -  -  A  10.0  <  an  <  23  ->  ci  (1) 

Where  the  condition  is  the  conjunction  of  the  different  attribute  tests  and  the  outcome  is  the 
predicted  class — a  tissue  type.  We  also  allow  a  special  condition — don't  care  — which 
just  always  returns  true  ,  allowing  condition  generalization.  The  rule  below  illustrates  an 
example  of  a  generalized  rule. 

1.0  <  a0  <  2.3  A  — 3.0<  «3  <  2 — *•  c\  (2) 


All  attributes  except  a0  and  a3  were  marked  as  don't  care. 

Each  condition  can  be  encoded  using  2  floating-point  numbers  per  condition,  where  a,- 
contains  the  lower  bound  of  the  condition  and  to,  its  upper  bound.  Thus,  the  condition  a,-  < 
a0  <  ojj  just  requires  to  store  the  two  floating-point  numbers.  For  efficiency  reasons  we 
store  them  in  two  separate  vectors,  on  containing  the  lower  bounds  and  the  other  con¬ 
taining  the  upper  bounds.  The  position  in  a  vector  indicates  the  attribute  being  tested.  The 
don't  care  condition  is  simply  encoded  as  a,-  >  to,-  and,  hence,  we  do  not  need  to  store  any 
extra  information. 

Matching  a  rule  requires  performing  the  individual  condition  tests  before  the  final  and 
operation  can  be  computed.  Vector  instruction  sets  improve  the  performance  of  this  pro¬ 
cess  by  performing  four  operations  at  once.  Actually,  this  process  may  be  regarded  as  four 
parallel  running  pipelines.  The  process  can  be  further  improved  by  stopping  the  matching 
process  when  one  test  fails — since  that  will  turn  the  condition  into  false. 

Figure  4  presents  a  C  implementation  the  proposed  hardware-supported  rule  matching. 
The  code  assumes  that  the  two  vectors  containing  the  upper  and  lower  bounds  are  provided 
and  records  are  stored  in  a  two  dimensional  matrix.  Figure  5  presents  the  vectorized 
implementation  of  the  code  presented  in  Fig.  4  using  SSE2  instructions.  Exploiting  the 
hardware  available  can  speed  between  3  and  3.5  times  the  matching  process,  as  also  shown 
elsewhere  (Flora  and  Sastry  2006). 


4.1.2  Massive  parallelism 

Since  most  of  the  time  is  spent  on  the  evaluation  of  candidate  rules  when  dealing  with  large 
data  sets,  our  next  goal  was  to  find  a  parallelization  model  that  could  take  advantage  of  this 
peculiarity.  Due  the  quasi  embarrassing  parallel  (Grama  et  al.  2003)  nature  of  the  candi¬ 
date  rule  evaluation,  we  designed  a  coarse-grain  parallel  model  for  distributing  the 
evaluation  load.  Cantu-Paz  (2000)  proposed  several  schemes,  showing  the  importance  of 
the  trade-off  between  computation  time  and  time  spent  communicating.  When  designing 
the  parallel  model,  we  focused  on  minimizing  the  communication  cost.  Usually,  a  feasible 
solution  could  be  a  master/slave  one — the  computation  time  is  much  larger  than  the 
communication  time.  However,  GBML  approaches  tend  to  use  rather  large  populations, 
forcing  us  to  send  rule  sets  to  the  evaluation  slaves  and  collect  the  resulting  fitness.  These 
schemes  also  increment  the  sequential  sections  that  cannot  be  parallelized,  threatening  the 
overall  speedup  of  the  parallel  implementation  as  a  result  of  Ambdhals  law  (Amdahl  1967). 

To  minimize  such  communication  cost,  each  processor  runs  an  identical  NAX  algorithm. 
They  are  all  seeded  in  the  same  manner,  hence,  performing  the  same  genetic  operations 
and  only  differing  in  the  portion  of  the  population  being  evaluated.  Thus,  the  population  is 
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1.  void  match_seq_rule_set  (  RuleSet  *  rs,  InstanceSet  is,  int  iDim,  int  iRows  )  { 

2.  int  i , j ,k, iCnt , iClsIdx, iGround, iPred; 

3.  register  int  iMatcheable; 

4.  Instance  ins; 

5. 

6.  iClsIdx  =  rs->iCorrectedDim; 

7.  clean_f itness_rules_set (rs) ; 

8.  for  (  i=0  ;  i<iRows  ;  i++  )  { 

9 .  ins  =  is  [i] ; 

10.  iPred=-l ; 

11.  for  (  j=0  ;  iPred==-l  kk  j<rs->iLen  ;  j++  )  { 

12 .  iMatcheable  =  1 ; 

13.  for  (  iCnt=0,k=j*(rs->iCorrectedDim+VBSIF)  ; 

14.  iMatcheable  kk  k<j*(rs->iCorrectedDim+VBSIF)+rs->iDim  ; 

15.  k++, iCnt++  )  { 

16.  iMatcheable  =  iMatcheable  kk 

17.  !(  (rs->pf LB [k] <=rs->pf UB [k] )  kk 

18 .  (  ins  [iCnt] <rs->pf LB  [k]  I  I  ins [iCnt] >rs->pfUB [k] ) ) ; 

19.  > 

20.  if  (  iMatcheable  ) 

21.  iPred  =  rs->pfLB  [j*(rs->iCorrectedDim+VBSIF)+rs->iCorrectedDim] ; 

22.  > 

23.  iPred  =  (iPred==-l)?rs->iClasses : iPred; 

24.  iGround=(int) ins [iClsIdx]  ; 

25 .  rs->pConf Mat [iGround]  [iPred] ++ ; 

26.  > 

27.  > 


Fig.  4  This  figure  presents  a  sequential  implementation  of  the  rule  matched  process  in  C  .  A  rule  set  is 
match  against  a  data  set.  Lines  16,  17,  and  18  implement  the  condition  test  for  one  attribute.  The 
implementation  also  computes  the  confusion  matrix  that  contains  the  ground  truth  versus  predicted  class 


treated  as  collection  of  chunks  where  each  processor  evaluates  its  own  assigned  chunk, 
sharing  the  fitness  of  the  individuals  in  its  chunk  with  the  rest  of  the  processors.  Fitness  can 
be  encapsulated  and  broadcasted  maximizing  the  occupation  of  the  underlying  packing 
frames  used  by  the  network  infrastructure.  Moreover,  this  approach  also  removes  the  need 
for  sending  the  actual  rules  back  and  forth  between  processors — as  a  master/slave  approach 
would  require — thus,  minimizing  the  communication  to  the  bare  minimum — the  fitness. 
Figure  6  presents  a  conceptual  scheme  of  the  parallel  architecture  of  NAX. 

To  implement  the  model  presented  in  Fig.  6,  we  used  C  and  a  message  passing  interface 
(MPI) — we  used  the  OpenMPI  implementation  (Gabriel  et  al.  2004).  Figure  7  shows  the 
code  in  charge  of  the  parallel  evaluation.  Each  processor  computes  which  individuals  are 
assigned  to  it.  Then  it  computes  the  fitness  and,  finally,  it  just  broadcast  the  computed 
fitness.  The  rest  of  the  process  is  left  untouched,  and  besides  the  cooperative  evaluation,  all 
the  processors  end  generating  the  same  evolutionary  trace. 


4.2  Rule  sets  as  individuals 

One  main  characteristic  of  the  so-called  Pittsburgh-style  learning  classifier  systems — a 
particular  type  of  GBML — is  that  individuals  encode  a  rule  set  (Goldberg  1989;  Llora  and 
Garrell  2001;  Goldberg  2002).  Thus,  evolutionary  mechanisms  directly  recombine  one  rule 
set  against  another  one.  For  classification  tasks  of  moderate  complexity,  the  rule  sets  are 
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#def ine  VEC_MATCH( vecFLB , f LB , vecFUB , f UB , vecINS , f IN , vecTmp , vecOne , vecRes)  {\ 
vecFLB  =  _mm_load_ps (f LB) ; \ 
vecFUB  -  _mm_load_ps (f UB) ; \ 
vecINS  -  _mm_load_ps (f IN) ; \ 

\ 

vecRes  =  ( _ ml28i)_mm_cmpgt_ps(vecFUB, vecFLB) ;\ 

vecTmp  -  _mm_or_sil28 (\ 

( _ ml28i) _irnn_cmpgt_ps (vecFLB , vecINS) , \ 

( _ ml28i) _mm_cmpgt_ps (vecINS , vecFUB) \ 

);\ 

vecRes  =  _mm_andnot_sil28(_mm_and_sil28(vecRes .vecTmp) , vecOne) ; \ 


int  iRows  )  { 


> 


void  match_rule_set  (  RuleSet  *  rs,  InstanceSet  is,  int  iDim, 
int  i ,  j  , k,  iCnt , iClsIdx ,  i Ground,  iPred; 
register  int  iMatcheable; 

Instance  ins; 

_ ml28i  vecRes, vecTmp, vecOne; 

_ ml28  vecFLB, vecFUB, vecINS; 


vecOne  =  ( _ ml28i){-l ,-1} ; 

iClsIdx  =  rs->iCorrectedDim; 
clean_f itness_rules_set (rs) ; 
for  (  i=0  ;  i<iRows  ;  i++  )  { 

//  Classify  the  instance 
ins  =  is  [i]  ; 
iPred=-l ; 

for  (  j=0  ;  iPred==-l  &&  j<rs->iLen  ;  j++  )  { 
iMatcheable  =  1; 

for  (  iCnt=0,k=j*(rs->iCorrectedDim+VBSIF)  ; 

iMatcheable  &&  k<j*(rs->iCorrectedDim+VBSIF)+rs->iDim  ; 
k+=VBSIF,iCnt+=VBSIF  )  { 

VEC_MATCH (vecFLB , & (r s->pf LB [k] ) , 
vecFUB ,&(rs->pfUB [k] ) , 

vecINS ,&(ins [iCnt] ) , vecTmp , vecOne .vecRes) ; 
iMatcheable  =  0xFFFF==_mm_movemask_epi8(vecRes) ; 

> 

if  (  iMatcheable  ) 

iPred  =  rs->pf LB [j * (rs->iCorrectedDim+VBSIF)+rs->iCorrectedDim] ; 
iPred  =  (iPred==-l) ?rs->iClasses : iPred; 
iGround=(int) ins [iClsIdx] ; 
rs->pConf Mat [iGround] [iPred] ++; 

> 


Fig.  5  This  figure  presents  a  vectorized  implementation  of  the  rule  matching  process  presented  in  Fig.  4. 
Lines  1-12  implement  the  parallelized  test  against  four  attributes  using  vector  instructions.  The  code  is 
written  using  C  intrinsics  for  SSE2  compatible  architectures.  This  code  runs  on  P4  or  newer  Intel  processors 
and  Opteron  or  Athlon  64  AMD  processors 


not  large.  However,  for  complex  problems,  the  potential  number  of  required  rules  to  ensure 
proper  classification  may  need  large  amounts  of  memory  that  become  prohibitive.  The 
requirements  increase  even  further  in  the  presence  of  noise  (Llora  and  Goldberg  2003). 
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Fig.  6  This  figure  illustrates  the  parallel  model  implemented.  Each  processor  is  running  the  same  identical 
NAX  algorithm.  They  only  differ  in  the  portion  of  the  population  being  evaluated.  The  population  is  treated  as 
collection  of  chunks  where  each  processor  evaluates  its  own  assigned  chunks  sharing  the  fitness  of  these 
individuals  with  the  rest  of  the  processors.  This  approach  minimizes  the  communication  cost 


Parallelization  may  not  help  much  if  we  need  to  send  large  rule  sets  across  the  commu¬ 
nication  network.  For  such  reasons,  GBML  techniques  work  very  well  on  moderate 
complexity  problems  (Bernado  et  al.  2001;  Bacardit  and  Butz  2006).  However,  they  need 
to  be  modified  to  deal  with  complex  and  large  data  set,  and  also  avoid  the  boundaries 
imposed  by  the  issues  mentioned  above. 


4.3  NAX:  Incremental  rule  learning  for  very  large  data  sets 

An  incremental  rule  learning  approach  may  alleviate  memory  footprint  requirements  by 
evolving  only  one  rule  at  a  time,  hence,  reducing  the  memory  requirements.  However,  one 
rule  by  itself  cannot  solve  complex  problems.  For  such  a  reason,  each  evolved  rule  is  added 
to  the  final  rule  set,  and  the  covered  examples  are  removed  from  the  current  training  set. 
The  process  is  repeated  until  no  instances  are  left  in  the  training  set.  This  approach  already 
introduced  by  Cordon  et  al.  (2001)  and  later  also  used  by  Bacardit  and  Krasnogor  (2006) 
allows  maintaining  relatively  small  memory  footprints,  making  feasible  processing  large 
data  sets — as  the  prostate  tissue  classification  data  set.  However,  an  incremental  approach 
to  the  construction  of  the  rule  set  requires  paying  special  attention  to  the  way  rules  are 
evolved.  For  each  run  of  the  genetic  algorithm  used  to  evolve  a  rule,  we  would  like  to 
obtain  a  maximally  general  and  maximally  accurate  rule,  that  is,  a  rule  that  covers  the 
maximum  number  of  example  without  making  mistakes  (Wilson  1995). 
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1.  void  evaluate_population  (  Population  *  pp,  InstanceSet  is,  int  iDim,  int  iRows  ) 

2.  { 

3 .  int  i ; 

4. 

5.  /*  Compute  the  fragments  of  this  processor  */ 

6.  int  iFrag  =  pp->iLen/FCS_processes ; 

7.  int  ilnit  =  FCS_process_id*iFrag; 

8.  int  iLast  =  (FCS_process_id+l==FCS_processes)? 

9.  pp->iLen: 

10.  (FCS_process_id+l)*iFrag; 

1 1 .  int  iCnt  =  0 ; 

12.  int  j ,k,l; 

13. 

14.  /*  Create  the  bucket  for  the  broadcast  */ 

15.  float  f aFit [2*iFrag] ; 

16.  float  f aTmp [2*iFrag] ; 

17. 

18.  /*  Evaluate  the  given  chunk  assigned  to  the  processor  */ 

19.  for  (  i=ilnit , iCnt=0  ;  i<iLast  ;  i++,iCnt++  )  { 

20 .  mat ch_rule_set (pp->prs [i] , is , iDim , iRows  ) ; 

21 .  compute_raw_accuracy_f itness_rule_set (pp->prs [i] ) ; 

22.  faFit[iCnt]  =  pp->prs [i] ->f Fitness ; 

23.  > 

24. 

25.  /*  Broadcast  each  of  the  chunks  */ 

26.  for  (  i=0  ;  i<FCS_processes  ;  i++  )  { 

27 .  MPI_Bcast ( (i==FCS_process_id) ?f aFit : f aTmp , iCnt ,MPI_FL0AT , i , MPI_C0MM_W0RLD) ; 

28.  if  (  i ! =FCS_process_id  ) 

29.  for  (  1=0, j=i*iFrag,  k=(i+l)*iFrag  ;  j<k  ;  j++,l++  ) 

30.  pp->prs [j] ->f Fitness  =  faTmp[l]; 

31.  > 

32.  > 


Fig.  7  This  figure  presents  an  implementation  of  the  proposed  parallel  evaluation  scheme  using  C  and  MPI. 
The  piece  of  code  presented  below  is  the  only  one  modified  to  provide  such  parallelization  capabilities. 
Each  processor  computes  which  individuals  are  assigned  to  it  (lines  6-10),  then  it  computes  the  fitness  (lines 
10-23),  and  then  it  just  broadcast  the  computed  fitness  (lines  26-31) 


Llora  et  al.  (2007)  have  shown  that  evolving  such  rules  is  possible.  In  order  to  promote 
maximally  general  and  maximally  accurate  rules  a  la  XCS  (Wilson  1995),  we  compute  the 
accuracy  (a)  and  the  error  (e)  of  a  rule  (Llora  et  al.  2005).  The  accuracy  is  the  proportion 
of  overall  examples  correctly  classified,  and  the  error  is  the  proportion  of  incorrect  clas¬ 
sifications  issued.  For  simplicity  reasons,  we  use  the  proportion  of  correctly  issues 
classifications  instead,  simplifying  the  final  fitness  calculation.  Let  nt+  be  the  number  of 
positive  examples  correctly  classified,  n,_  the  number  of  negative  examples  correctly 
classified,  nm  the  number  of  times  a  rule  has  been  matched,  and  n,  the  number  of  examples 
available.  Using  these  values,  the  accuracy  and  error  of  a  rule  r  can  be  computed  as: 


<r)=nr+(r)  +  nt.(r) 
n. 


e(r) 


nt+(r) 

nm(f) 


(3) 

(4) 


Once  the  accuracy  and  error  of  a  rule  are  known,  the  fitness  can  be  computed  as 
follows. 
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f(r)  =  <x(r)  ■  E(r)y  (5) 

where  y  is  the  error  penalization  coefficient.  The  above  fitness  measure  favors  rules  with  a 
good  classification  accuracy  and  a  low  error,  or  maximally  general  and  maximally  accurate 
rules.  By  increasing  y,  we  can  bias  the  search  towards  correct  rules.  This  is  an  important 
element  because  assembling  a  rule  set  based  on  accurate  rules  guarantees  the  overall 
performance  of  the  assembled  rule  set.  In  our  experiments,  we  have  set  y  to  18  to  strongly 
bias  the  search  toward  maximally  general  and  maximally  accurate  rules. 

NAX  ’s  efficient  implementation  of  the  evolutionary  process  is  based  on  the  techniques 
described  using  hardware  acceleration — Sect.  4.1.1 — and  coarse-grain  parallelism — 
Sect.  4.1.2.  The  genetic  algorithm  used  was  a  modified  version  of  the  simple  genetic 
algorithm  (Goldberg  1989)  using  tournament  selection  (s  =  4),  one  point  crossover,  and 
mutation  based  on  generating  new  random  boundary  elements. 


5  Experiments 

This  section  present  the  results  achieved  using  NAX.  To  allow  the  reader  to  compare  with 
other  techniques,  we  compare  the  results  obtained  using  NAX  on  small  data  sets  provided  by 
the  UCI  repository  (Merz  and  Murphy  1998)  to  other  well-known  supervised  learning 
algorithms.  Finally,  we  present  the  first  results  on  the  prostate  tissue  prediction  obtained 
using  NAX.  Results  focus  on  the  viability  of  the  NAX  approach. 


5.1  Some  UCI  repository  data  sets 

The  UCI  repository  (Merz  and  Murphy  1998)  provides  several  data  sets  for  different 
machine  learning  problems.  These  data  sets  have  been  widely  used  to  test  traditional 
machine  learning  and  GBML  techniques.  Table  1  list  the  data  sets  used.  Due  to  the  nature 
of  the  prostate  tissue  type  classification,  we  only  chose  data  sets  with  numeric  attributes. 
Three  of  these  data  sets  are  of  relevant  interest:  (1)  son,  by  far  the  one  with  larger 
dimensionality,  (2)  gls,  the  one  with  large  number  of  classes,  (3)  tao,  proposed  by  Llora 
and  Garrell  (2001),  having  complex  and  non-linear  boundaries. 


Table  1  Summary  of  the  data  sets  used  in  the  experiments 


ID 

Data  set 

Size 

Missing 

values(%) 

Numeric 

attributes 

Nominal 

attributes 

Classes 

bre 

Wisconsin  Breast  Cancer 

699 

0.3 

9 

- 

2 

bpa 

Bupa  Liver  Disorders 

345 

0.0 

6 

- 

2 

gls 

Glass 

214 

0.0 

9 

- 

6 

h  —  s 

Heart  Stats-Log 

270 

0.0 

13 

- 

2 

ion 

Ionosphere 

351 

0.0 

34 

- 

2 

irs 

Iris 

150 

0.0 

4 

- 

3 

son 

Sonar 

208 

0.0 

60 

- 

2 

tao 

Tao 

1888 

0.0 

2 

- 

2 

win 

Wine 

178 

0.0 

13 

- 

3 
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Table  2 

ten-fold 

Experimental  results:  percentage  of  correct  classifications  and  standard  deviation  from  stratified 
cross-validation  runs 

ID 

0-R 

C4.5 

NAX 

bre 

65.52  ±  1.16 

95.42  ±  1.69 

96.43  ±  1.72 

bpa 

57.97  ±  1.23 

65.70  ±  3.84 

64.07  ±  8.36 

gls 

35.51  ±  4.49 

65.89  ±  10.47 

68.02  ±  8.69 

h  —  s 

55.55  ±  0.00 

76.30  ±  5.85 

75.56  ±  9.39 

ion 

64.10  ±  1.19 

89.74  ±  5.23 

89.19  ±  5.27 

irs 

33.33  ±  0.00 

95.33  ±  3.26 

94.67  ±  4.98 

son 

53.37  ±  3.78 

71.15  ±  8.54 

73.62  ±  9.72 

tao 

49.79  ±  0.17 

95.07  ±  2.11 

97.41  ±  0.92 

win 

39.89  ±  3.22 

93.82  ±  2.85 

94.34  ±  6.09 

Paired  /-test  comparisons  showed  no  statistically  significant  differences  between  C4.5  and  NAX  results 
0-R  result  are  just  provided  as  guiding  base  line 


We  could  have  chosen  complex  algorithms  as  baselines  for  NAX  .  However,  we  would 
not  be  able  to  use  them  to  repeat  the  experimentation  on  the  prostate  tissue  classification 
domain.  The  algorithms  used  in  the  comparison  presented  in  Table  2  were  0-R  (Holte 
1993)  (a  simple  base  line  based  on  majority  class  classification)  and  C4.5  (Quinlan  1993). 
Results  show  percentage  of  correct  classifications  and  standard  deviation  from  stratified 
ten-fold  cross-validation  runs.  Paired  t-test  comparisons  showed  no  statistically  significant 
differences  between  the  pruned  tree  produced  by  C4.5  and  NAX  results.  This  experiments 
also  helped  validate  the  distributed  implementation  proposed  by  NAX.  Further  results  on 
empirical  comparisons  can  be  found  elsewhere  (Bernado  et  al.  2001;  Bacardit  and  Butz 
2006). 


5.2  Prostate  tissue  classification 

With  the  previous  results  at  hand,  we  ran  NAX  against  the  prostate  tissue  classification  data 
set.  The  original  data  set  is  defined  by  93  attributes.  In  this  paper,  however,  we  used  the 
reduced  version  of  this  data  set  proposed  by  (Fernandez  et  al.  2005)  which  contains  20 
selected  attributes  out  of  the  93  available.  The  dataset  is  form  by  171,314  records.  Our  goal 
was  to  explore  how  well  NAX  could  generalize  over  unseen  tissue — this  is  the  first  step  to  be 
able  to  address  the  cancer  prediction  problem.  The  other  reason  that  motivated  such 
experimentation  was  to  achieve  similar  accuracy  results  as  the  ones  published  earlier  by 
Fernandez  et  al.  (2005)  using  a  modified  Bayes  technique.  If  NAX  could  perform  at  the 
same  level,  we  will  also  obtain  a  set  of  rules  of  interest  to  the  spectroscopist.  The  inter¬ 
pretation  of  the  rules  will  provide  insight  on  how  to  interpret  the  models  provided  by 
NAX  — which  could  not  be  done  with  the  models  early  used  by  Fernandez  et  al.  (2005). 

We  conducted  stratified  10-fold  cross-validation  experiments  to  measure  the  general¬ 
ization  capabilities  of  NAX  for  this  problem.  Since  the  problem  was  rather  small — larger 
data  set  are  being  prepared  to  be  run  at  the  supercomputing  facilities  provided  by  the 
National  Center  for  Supercomputing  Applications — we  run  the  ten-fold  cross-validation 
runs  in  a  3GHz  dual  core  Pentium  D  computer  with  4  GB  of  RAM.  NAX  took  advantage  of 
the  hardware  support  to  speedup  the  matching  process  and  uses  two  MPI  processes  to 
parallelize — as  introduced  in  Fig.  6 — the  evaluation  of  the  overall  population.  Each  fold 
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took  about  one  hour  to  complete,  with  the  entire  classification  lasting  less  than  half  a  day. 
We  conducted  a  simple  test  of  adding  a  second  computer  with  an  identical  configuration. 
The  overall  time  for  cross-validation  was  reduced  to  half.  Rough  estimates — which  will 
better  measured  when  larger  experiments  are  conducted  on  NCSA  super  computers — show 
that  the  sequential  portion  is  around  1:1000  for  this  small  data  set.  Numbers  get  better  as 
data  set  increases,  which  demonstrates  that  we  will  be  able  to  process  very  large  data  sets 
and  efficiently  exploit  larger  numbers  of  processors. 

We  proposed  another  measure  of  effectiveness,  namely  how  many  records  can  be 
processed  per  second.  Using  a  single  processor  with  the  hardware  acceleration  mechanisms 
built  into  NAX,  and  the  evolved  rule  set  formed  by  1,028  rules,  the  average  throughput  was 
around  60,000  records  per  second.  For  the  prostate  tissue  classification,  it  took  less  than 
three  seconds  to  classify  the  entire  data  set.  Once  the  rule  set  is  learnt,  the  classification 
problem  falls  again  into  the  category  of  embarrassingly  parallel  problems  (Grama  et  al. 
2003).  Since  no  communication  is  needed,  the  speedup  grows  linearly  with  the  number  of 
processors  added — with  the  proper  rule  set  replication  and  data  set  chunking.  Thus,  with 
the  dual  core  box  used  we  where  able  to  just  double  the  throughput  (120,000  records  per 
second)  by  chunking  the  data  set  and  use  both  processors. 

The  previous  results  show  the  benefits  of  hardware  acceleration  and  parallelization,  but 
NAX  was  also  able  to  achieve  very  competitive  classification  accuracy  in  generalization, 
correctly  classifying  97.09  ±  0.09  of  the  records  (pixels)  during  the  stratified  ten-fold 
cross-validation.  Figure  8  presents  the  regenerated  prostate  tissue  classification  image 
presented  in  Fig.  2  using  a  rule  set  assembled  by  NAX.  Figure  8a  presents  the  incorrectly 
classified  pixels.  Most  of  the  mistakes  by  the  rule  set  involve  similar  tissues  with  few 
training  records  available.  This  trend  was  also  shown  elsewhere  (Fernandez  et  al.  2005). 
C4.5  does  not  provide  any  statistically  significant  improvement  (only  a  marginal,  not 
statistically  significant,  0.7%)  and  provided  large  decision  trees  with  more  than  5,000 
leaves — not  to  mention  the  lack  of  scalability  when  compared  to  NAX. 

The  rule  set  assembled  by  NAX  represents  an  incremental  assembling  of  maximally 
general  and  maximally  accurate  rules.  Thus,  we  can  compute  how  the  accuracy  of  such 
ensemble  improves  as  new  rules  are  added.  Figure  9  presents  the  overall  accuracy  as  rules 
are  added.  It  shows  an  interesting  behavior  for  classifying  prostate  tissue.  Using  only  20 
rules  out  of  the  1 ,028  evolved  ones,  the  overall  accuracy  is  90%,  the  incorrectly  classified 
1.3%  pixels,  and  8.7%  were  left  unclassified.  After  inspecting  the  misclassified  pixels  most 
of  them  belongs  to  borders  between  tissues  and  mislabeling  arises  from  the  image  dis¬ 
cretization — one  pixel  containing  different  tissue  types.  Table  3  presents  the  initial  four 
rules  that  covering  80%  of  the  instances  belonging  to  the  two  larger  tissue  types — 
epithelium  and  fibrous  stroma.  Such  results  are  relevant,  not  only  for  their  accuracy,  but 
also  because  of  the  insight  they  provide  to  the  spectroscopist  about  the  problem  structure. 


6  Conclusions  and  further  work 

This  paper  has  presented  the  initial  results  achieved  in  predicting  prostate  tissue  type  using 
GBML  techniques.  Being  able  to  classify  unseen  tissue  quickly,  reliably,  and  accurately,  is 
the  first  step  towards  the  creation  of  CADx  systems  that  may  assist  a  pathologist  diag¬ 
nosing  prostate  cancer.  We  have  proposed  two  main  efficiency  enhancement  techniques  for 
GBML — exploiting  hardware  parallelization  via  vector  instructions  and  coarse-grain  par¬ 
allelism  via  the  usage  of  MPI  libraries — which  allowed  us  to  approach  very  large  data  sets. 
These  techniques,  together  with  an  incremental  genetics-based  rule  learning  approach  to 
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Fig.  8  The  figures  presented 
above  show  the  regenerated 
prostate  tissue  classification 
image  presented  in  Fig.  2.  (a) 
presents  the  correctly  classified 
pixels,  (b)  presents  the 
incorrectly  classified  pixels 


(a) 


(b) 


/■ 


Incorrectly  classified  prostate  tissue 

assemble  rule  sets  formed  by  maximally  general  and  maximally  accurate  rules,  have  led  to 
the  creation  of  NAX,  a  system  specialized  on  dealing  with  large  data  sets. 

Results  have  shown  accurate  classification  models  for  prostate  tissue  along  with  good 
scalability  of  the  NAX  implementation.  Results  also  reveal  peculiarities  of  the  underlying 
problem  structure.  With  very  few  rules — 20 — we  were  able  to  correctly  classify  up  to  90% 
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Fig.  9  The  rule  set  as  a  decision 
list.  The  figure  presents  the 
classification  accuracy  as  we 
keep  adding  rules  to  the  decision 
list.  The  first  20  initial  rules  are 
able  to  cover  91%  of  the  records 
with  a  classification  accuracy  of 
98.5-90%  overall  accuracy 
presented  in  the  figure 


Number  of  rules  used 


Table  3  First  top  four  maximally  general  and  maximally  accurate  rules  that  compose  the  final  rule  set.  The 
rule  set  is  treated  as  a  decision  list,  thus  we  can  easily  incrementally  evaluate  the  value  of  the  initial  four 
ones 

Rule  Rule  condition  Tissue  type  Accumulated  Covered 

accuracy  (%)  records  (%) 


1.  0.10  <  a\  <  0.25  A  0.00  <  a^<  0.04  A  — ►  Fibrous  stroma  41.32  41.96 

1.07  <  a8  <  2.01  A  -0.07  <  a16  <  0.16  A 

0.25  <  an  <  2.86  A  0.11  <  al8  <  0.21 

2.  0.03  <  ax  <  0.11  A  0.05  <  a7  <  0.20  A  ->  Epithelium  68.53  69.61 

1231.88  <fli2<  1247.90  A  1.98  <  axl  <  3.83  A 


0.13  < 

«18  ^ 

0.20 

3. 

0.07  < 

VI 

o 

<3 

0.16  A  0.14  <  a,  <  0.41  A 

->  Fibrous  stroma 

71.59 

72.75 

0.71  < 

«10  S 

1.13  A  1527.54  <  Qi5  <  1533.80  A 

0.65  < 

VI 

O' 

<3 

1.50 

4. 

0.05  < 

VI 

<3 

0.09A  0.76  <  aA<  1.29A 

->  Epithelium 

80.78 

82.08 

1.80  < 

a6  < 

2.08A  0.17  <  a7  <  0.24A 

0.26  < 

VI 

'•O 

<3 

0.53A  2.79  <  au  <  7.01A 

0.21  < 

VI 

oc 

<3 

0.32 

of  the  tissue.  Our  current  work  is  focused  on  analyzing  the  larger  data  sets  containing  all 
the  available  features  and  different  tissue  sources  to  test  the  parallelization  scalability  of 
NAX  on  NCSA  supercomputers.  Once  accomplished,  the  procedure  will  provide  confidence 
in  creating  a  CADx  system  to  generate  a  diagnosis  based  on  the  evolved  models. 

Acknowledgments  We  would  like  to  thank  David  E.  Goldberg  for  his  continual  support  and  encour¬ 
agement,  allowing  us  to  have  access  to  the  IlliGAL  resources.  Thanks  also  to  Kumara  Sastry  for  hallway 
discussions  and  to  the  Automated  Learning  Group  and  the  Data-Intensive  Technologies  and  Applications  at 
the  National  Center  for  Supercomputing  Applications  for  hosting  this  joint  collaboration. This  work  was 
sponsored  by  the  Air  Force  Office  of  Scientific  Research,  Air  Force  Materiel  Command,  USAF,  under  grant 


Springer 


Histopathology  using  genetics-based  machine  learning 


FA9550-06- 1-0370,  the  National  Science  Foundation  under  grant  IIS-02-09199,  and  the  National  Institute  of 
Health.  The  US  Government  is  authorized  to  reproduce  and  distribute  reprints  for  Government  purposes 
notwithstanding  any  copyright  notation  thereon.The  views  and  conclusions  contained  herein  are  those  of  the 
authors  and  should  not  be  interpreted  as  necessarily  representing  the  official  policies  or  endorsements,  either 
expressed  or  implied,  of  the  Air  Force  Office  of  Scientific  Research,  the  National  Science  Foundation,  or  the 
US  Government.Rohit  Bhargava  would  like  to  acknowledge  collaborators  over  the  years,  especially  Dr. 
Stephen  M.  Hewitt  and  Dr.  Ira  W.  Levin  of  the  National  Institutes  of  Health,  for  numerous  useful  dis¬ 
cussions  and  guidance.  Funding  for  this  work  was  provided  in  part  by  University  of  Illinois  Research  Board 
and  by  the  Department  of  Defense  Prostate  Cancer  Research  Program.  This  work  was  also  funded  in  part  by 
the  National  Center  for  Supercomputing  Applications  and  the  University  of  Illinois,  under  the  auspices  of 
the  NCSA/UIUC  faculty  fellows  program. 


References 

Amdahl  G  (1967)  Validity  of  the  single  processor  approach  to  achieving  large-scale  computing  capabilities. 
In  Proceedings  of  the  American  federation  of  information  processing  societies  conference  (AFIPS). 
30:483—485  AFIPS 

Bacardit  J,  Butz  M  (2006)  Advances  at  the  frontier  of  Learning  Classifier  Systems.  Chapter  data  mining  in 
Learning  Classifier  Systems:  Comparing  XCS  with  GAssist,  vol  I.  Springer 
Bacardit  J,  Krasnogor  N  (2006)  Biohel:  Bioinformatics-oriented  hierarchical  evolutionary  learning 
(Nottingham  ePrints).  University  of  Nottingham 
Barry  A,  Dmgowitsch  J  (1997)  LCSWeb:  the  LCS  wiki,  http://www.lcsweb.cs.bath.ac.uk/ 

Bernado  E,  Llora  X,  Garrell  J  (2001)  Advances  in  Learning  Classifier  Systems:  4th  international  workshop 
(IWLCS  2001).  Chapter  XCS  and  GALE:  a  comparative  study  of  two  Learning  Classifier  Systems  with 
six  other  learning  algorithms  on  classification  tasks.  Springer  Berlin,  Heidelberg,  pp  115-132 
Bhargava  R,  Fernandez  D,  Hewitt  S,  Levin  I  (2006)  High  throughput  assessment  of  cells  and  tissues: 
Bayesian  classification  of  spectral  metrics  from  infrared  vibrational  spectroscopic  imaging  data. 
Biochemica  et  Biophisica  Acta  1758(7):830-845 

Cantu-Paz  E  (2000)  Efficient  and  accurate  parallel  genetic  algorithms.  Kluwer  Academic  Publishers 
Cordon  O,  Herrera  F,  Hoffmann  F,  Magdalena  L  (2001)  Genetic  fuzzy  systems.  Evolutionary  tuning  and 
learning  of  fuzzy  knowledge  bases.  World  Scientific 

Fernandez  D,  Bhargava  R,  Hewitt  S,  Levin  I  (2005)  Infrared  spectroscopic  imaging  for  histopathologic 
recognition.  Nat  Biotechnol  23(4):469^T74 

Flockhart  I  (1995)  GA-MINER:  parallel  data  mining  with  hierarchical  genetic  algorithms  (final  report). 
(Technical  Report  Technical  Report  EPCCAIKMS-GA-MINER-REPORT  1.0).  University  of 
Edinburgh 

Gabriel  E,  Fagg  G,  Bosilca  G,  Angskun  T,  Dongarra  J,  Squyres  J,  Sahay  V,  Kambadur  P,  Barrett  B, 
Lumsdaine  A,  Castain  R,  Daniel  D,  Graham  R,  Woodall  T  (2004)  Open  MPI:  goals,  concept,  and  design 
of  a  next  generation  MPI  implementation.  In  Proceedings  of  the  11th  European  PVMMPI  Users’  group 
meeting  Springer 

Goldberg  D  (1989)  Genetic  algorithms  in  search,  optimization,  and  machine  learning.  Addison-Wesley 
Professional 

Goldberg  D  (2002)  The  design  of  innovation:  lessons  from  and  for  competent  genetic  algorithms.  Springer 
Grama  A,  Gupta  A,  Karypis  G,  Kumar  V  (2003)  Introduction  to  parallel  computing.  Addison-Wesley 
Holte  R  (1993)  Very  simple  classification  rules  perform  well  on  most  commonly  used  datasets.  Mach  Learn 
11:63-91 

Lattouf  J-B,  Saad  F  (2002)  Gleason  score  on  biopsy:  is  it  reliable  for  predcting  the  final  grade  on  pathology? 
BJU  Int  90:694-699 

Levin  I,  Bhargava  R  (2005)  Fourier  transform  infrared  vibrational  spectroscopic  imaging:  integrating 
microscopy  and  molecular  recognition.  Annu  Rev  Phys  Chem  56:  429-474 
Llora  X  (2002)  Genetics-based  machine  learning  using  fine-grained  parallelism  for  data  mining.  Doctoral 
dissertation,  Enginyeria  i  Arquitectura  La  Salle.  Ramon  Llull  University,  Barcelona,  Catalonia,  Euro¬ 
pean  Union 

Llora  X  (2006)  Learning  Classifier  Systems  and  other  genetics-based  machine  learning  Blog. 
http://www-illigal.ge. uiuc.edulcs-n-gbml/ 

Llora  X,  Garrell  J  (2001)  Knowledge-independent  data  mining  with  fine-grained  parallel  evolutionary 
algorithms.  In  Proceedings  of  the  genetic  and  evolutionary  computation  conference  (GECCO’2001). 
Morgan  Kaufmann  Publishers,  pp  461-468 


Springer 


X.  Llora  et  al. 


Llora  X,  Goldberg  D  (2003)  Bounding  the  effect  of  noise  in  multiobjective  Learning  Classifier  Systems. 
Evol  Comput  J  ll(3):279-298 

Llora  X,  Sastry  K  (2006)  Fast  rule  matching  for  Learning  Classifier  Systems  via  vector  instructions.  In 
Proceedings  of  the  2006  genetic  and  evolutionary  computation  conference.  ACM  Press,  pp  1513-1520 
Llora  X,  Sastry  K,  Goldberg  D  (2005)  The  compact  classifier  system:  motivation,  analysis  and  first  results. 
In  Proceedings  of  the  congress  on  evolutionary  computation,  vol  1.  IEEE  press,  (Also  as  IlliGAL  TR  No 
2005019,  pp  596-603) 

Llora  X,  Sastry  K,  Goldberg  D,  de  la  Ossa  L  (2007)  The  %-ary  extended  compact  classifier  system:  linkage 
learning  in  Pittsburgh  LCS.  In  Advances  at  the  frontier  of  Learning  Classifier  Systems,  vol  II.  IlliGAL 
report  no  2006015.  Springer,  pp  (in  preparation) 

Merz  CJ,  Murphy  PM  (1998)  UCI  repository  for  machine  learning  data-bases,  http://www.ics.uci. 

edu/  ~  mlearn/MLRepository.html 
Mitchell  T  (1997)  Machine  learning.  McGraw  Hill 

Orriols-Puig  A,  Bernado-Mansilla  E  (2006)  A  further  look  at  UCS  classifier  system.  In  Proceedings  of  the 
8th  annual  conference  on  genetic  and  evolutionary  computation  workshop  program.  ACM  Press 
Quinlan  JR  (1993)  C4.5:  Programs  for  machine  learning.  Morgan  Kaufmann 
Stone  C,  Bull  L  (2003)  For  real!  XCS  with  continuous-valued  inputs.  Evol  Comput  J  ll(3):279-298 
Wilson  S  (1995)  Classifier  fitness  based  on  accuracy.  Evol  Comput  3(2):  149-175 
Wilson  S  (2000a)  Get  real!  XCS  with  continuous-valued  inputs.  Lect  Notes  Comput  Sci  1813:209-219 
Wilson  S  (2000b)  Mining  oblique  data  with  xcs.  In  Revised  papers  of  the  3th  international  workshop  on 
Learning  Classifier  Systems  (IWLCS  2000).  Springer,  pp  158-176 


Springer 


Towards  Better  than  Human  Capability  in  Diagnosing 
Prostate  Cancer  Using  Infrared  Spectroscopic  Imaging 


Xavier  Llora1,  Rohith  Reddy2  3,  Brian  Matesic2,  and  Rohit  Bhargava23 

National  Center  for  Super  Computing  Applications  (NCSA) 
department  of  Bioengineering 
3Beckman  Institute  for  Advanced  Science  and  Technology 
University  of  Illinois  at  Urbana-Champaign,  Urbana  IL  61801 
xllora@uiuc.edu,  rkreddy2@uiuc.edu,  matesic2@uiuc.edu,  rxb@uiuc.edu 


ABSTRACT 

Cancer  diagnosis  is  essentially  a  human  task.  Almost  univer¬ 
sally,  the  process  requires  the  extraction  of  tissue  (biopsy) 
and  examination  of  its  microstructure  by  a  human.  To  im¬ 
prove  diagnoses  based  on  limited  and  inconsistent  morpho¬ 
logic  knowledge,  a  new  approach  has  recently  been  proposed 
that  uses  molecular  spectroscopic  imaging  to  utilize  micro¬ 
scopic  chemical  composition  for  diagnoses.  In  contrast  to 
visible  imaging,  the  approach  results  in  very  large  data  sets 
as  each  pixel  contains  the  entire  molecular  vibrational  spec¬ 
troscopy  data  from  all  chemical  species.  Here,  we  propose 
data  handling  and  analysis  strategies  to  allow  computer- 
based  diagnosis  of  human  prostate  cancer  by  applying  a 
novel  genetics-based  machine  learning  technique  (NAX).  We 
apply  this  technique  to  demonstrate  both  fast  learning  and 
accurate  classification  that,  additionally,  scales  well  with 
parallelization.  Preliminary  results  demonstrate  that  this 
approach  can  improve  current  clinical  practice  in  diagnos¬ 
ing  prostate  cancer. 
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1.  INTRODUCTION 

Pathologist  opinion  of  structures  in  stained  tissue  is  the 
definitive  diagnosis  for  almost  all  cancers  and  provides  criti¬ 
cal  input  for  therapy.  In  particular,  prostate  cancer  accounts 
for  one-third  of  noncutaneous  cancers  diagnosed  in  US  men, 
and  it  is  a  leading  cause  of  cancer-related  death.  Hence, 
it  is,  appropriately,  the  subject  of  heightened  public  aware¬ 
ness  and  widespread  screening.  If  prostate-specihc  antigen 
(PSA)  or  digital  rectal  screens  are  abnormal,  a  biopsy  is 
considered  to  detect  or  rule  out  cancer.  Prostate  tissue  is 
extracted,  or  biopsied,  from  the  patient  and  examined  for 
structural  alterations.  The  diagnosis  procedure  involves  the 
removal  of  cells  or  tissues,  staining  them  with  dyes  to  pro¬ 
vide  visual  contrast  and  examination  under  a  microscope  by 
a  skilled  person  (pathologist). 

The  challenge  in  prostate  cancer  research  and  practice 
is  to  provide  a  novel  Due  to  personnel,  tarining,  natural 
variability  and  biologic  differences,  the  challenge  in  prostate 
cancer  research  and  practice  is  to  provide  accurate,  objec¬ 
tive  and  reproducible  decisions.  Conventional  optical  mi¬ 
croscopy  followed  by  manual  recognition  has  been  demon¬ 
strated  to  be  inadequate  for  this  task.  [18].  Hence,  we  have 
recently  proposed  developing  a  practical  approach  to  this 
problem  using  chemical,  rather  than  morphologic,  imaging. 
[19].  In  this  approach,  Fourier  transform  infrared  imag¬ 
ing  (FTIR)  is  employed  to  provide  the  entire  vibrational 
spectroscopic  information  from  every  pixel  of  a  sample’s  mi¬ 
croscopy  image.  While  the  first  steps  of  developing  novel 
imaging  and  sampling  technologies  is  now  reliable,  [7]  the 
computational  challenge  of  providing  robust  classification 
algorithms  that  can  rapidly  provide  decisions  remains.  Due 
to  the  above  advances  in  imaging  and  sampling,  data  from 
thousands  of  patients  is  available  to  train  and  validate  al¬ 
gorithms  for  different  disease  states.  While  the  application 
and  type  of  data  are  unique,  a  further  confounding  factor  re¬ 
quired  efficiently  processing  large  volumes  of  data  generated 
by  FTIR.  imaging.  The  classification  problem  can  be  for¬ 
mulated  as  a  supervised  learning  problem  in  which  several 
million  pixels  (hundred  of  gigabytes)  of  accurately  labeled 
data  are  available  for  model  training  and  validation.  The 
volume  of  tissue  and  (future)  need  for  intra-operative  diag¬ 
noses  imply  that  rapid  and  accurate  diagnoses  are  crucial 
to  allow  physicians  to  explore  all  possible  courses  of  action. 
Under  these  conditions,  traditional  supervised  learning  ap- 
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proaches  and  implementations  do  not  scale  to  provide  diag¬ 
noses  in  an  appropriate  time  frame.  Hence,  efficiently  pro¬ 
cessing  and  learning  models  from  gigabytes  of  FITR  imag¬ 
ing  data  requires  a  careful  design  of  the  supervised  learning 
algorithm.  Moreover,  the  biological  nature  of  the  problem 
requires  that  such  models  be  interpretable  to  provide  funda¬ 
mental  new  insight  into  the  disease  process.  Genetics-based 
machine  learning  (GBML)  techniques  take  advantage  of  the 
“quasi  embarrassing  parallelism”  [17]  to  provide  scaleable, 
fast,  accurate,  reliable,  and  interpretable  models.  In  this 
paper  we  present  an  approach  engineered  to  the  desired  so- 
lutiona  and  constraints  of  addressing  this  human  task.  A 
modified  version  of  a  sequential  genetics-based  rule  learner 
that  exploits  massive  parallelisms  via  the  message  passing 
interface  (MPI)  and  efficient  rule-matching  using  hardware- 
oriented  operations  is  developed.  We  named  this  system  NAX 
[24],  and  we  have  shown  that  its  performance  is  compara¬ 
ble  to  traditional  and  genetics-based  machine  learning  tech¬ 
niques  on  an  array  of  publicly  available  data  sets.  We  now 
show  thatNAX  -taking  advantage  of  both  hardware  and  soft¬ 
ware  parallelism — is  able  to  provide  prostate  cancer  diag¬ 
noses  that  are  human-competitive.  In  this  paper,  we  present 
preliminary  results  supporting  this  outcome. 

The  paper  is  structured  as  follows.  Section  2  provides 
an  overview  of  our  approach  towards  computer-aided  diag¬ 
noses  for  prostate  cancer.  Procedure  and  form  of  the  data 
are  summarized  in  section  3.  NAX  is  introduced  in  section 
4,  where  we  describe  the  basic  components  and  design  deci¬ 
sions  in  this  approach.  In  section  5  we  present  preliminary 
results  indicating  that  the  approach  presented  in  this  paper 
is  human-competitive.  Finally,  section  6  summarizes  some 
conclusions  and  further  research. 

2.  PROBLEM  DESCRIPTION 

Prostate  cancer  is  the  most  common  non-skin  malignancy 
in  the  western  world.  The  American  Cancer  Society 
estimated  234,460  new  cases  of  prostate  cancer  in  2006 
[31].  Recognizing  the  public  health  implications  of  this 
disease,  men  are  actively  screened  through  digital  rectal 
examinations  and/or  serum  prostate  specific  antigen  (PSA) 
level  testing.  If  these  screening  tests  are  suspicious,  prostate 
tissue  is  extracted,  or  biopsied,  from  the  patient  and  exam¬ 
ined  for  structural  alterations.  Due  to  imperfect  screening 
technologies  and  repeated  examinations,  it  is  estimated  that 
more  than  1  million  people  undergo  biopsies  in  the  US  alone. 

2.1  Prostate  Cancer  Diagnosis 

The  removal  of  a  small  section  of  prostate  is  most  of¬ 
ten  accomplished  by  core  biopsy.  A  needle  is  inserted  into 
the  tissue  and  several  (6-23)  samples  are  obtained  from  dif¬ 
ferent  positions.  Biopsy,  followed  by  manual  examination 
under  a  microscope  is  the  primary  means  to  definitively  di¬ 
agnose  prostate  cancer  as  well  as  most  internal  cancers  in 
the  human  body.  Pathologists  are  trained  to  recognize  pat¬ 
terns  of  disease  in  the  architecture  of  tissue,  local  structural 
morphology  and  alterations  in  cell  size  and  shape.  Specific 
patterns  of  specific  cell  types  distinguish  cancerous  and  non- 
cancerous  tissues.  Hence,  the  primary  task  of  the  patholo¬ 
gist  examining  tissue  for  cancer  is  to  locate  foci  of  the  cell 
of  interest  and  examine  them  for  alterations  indicative  of 
disease. 

The  specific  cells  in  which  cancer  arises  in  the  prostate 


are  epithelial  cells.  While  epithelial-origin  cancers  account 
for  over  85%  of  all  human  cancers,  they  account  for  more 
than  95%  of  prostate  cancers.  In  prostate  tissue,  epithe¬ 
lial  line  secretory  ducts  within  the  structural  cells  (collec¬ 
tively  termed  ‘stroma’)  that  allow  the  tissue  to  maintain  its 
structure  and  function.  Hence,  a  pathologist  will  first  locate 
epithelial  cells  in  a  biopsy  and,  to  examine  for  cancer,  will 
mentally  segment  them  from  stroma. 

Biopsy  samples  are  prepared  in  a  specific  manner  to  aid 
in  recognition  of  cells  and  disease.  The  sample  is  sliced  thin 
(~  5/im  thickness),  placed  on  a  glass  slide  and  stained  with 
a  dye  to  provide  contrast.  The  most  common  dye  is  a  mix¬ 
ture  of  hematoxylin  and  eosin  (HlkE),  which  stains  protein- 
rich  regions  pink  and  nucleic  acid-rich  regions  blue.  Empty 
space,  lipids  and  carbohydrates  are  typically  not  stained  and 
characterized  by  white  color  in  images.  Staining  allows  the 
pathologist  to  identify  cells  based  on  their  nucleus  and  extra- 
nuclear  regions.  Patterns  of  the  same  cell  type  characterize 
structures.  For  example,  epithelial  cells  arranged  in  a  circu¬ 
lar  manner  around  empty  space  are  characteristic  of  a  duct 
and  endothelial  cells  similarly  arranged  are  characteristic  of 
blood  vessels.  The  empty  space  enclosed  within  a  duct  in 
pathology  images  is  termed  a  lumen.  The  distortion  of  the 
circular  pattern  of  epithelial  cells  around  a  lumen  is  charac¬ 
teristic  of  cancer. 

In  low  severity  cancers,  lumens  are  only  slightly  distorted, 
while  higher  grades  of  cancer  display  a  lack  of  lumen  and 
simply  consist  of  masses  of  epithelial  cells  supported  by  little 
stroma.  The  relative  distortion  and  change  in  lumen  shape 
is  organized  into  a  grading  scheme  to  assess  the  severity  of 
the  disease,  Gleason  Scoring  system,  which  is  the  primary 
measure  of  disease  that  defines  diagnosis,  helps  direct  ther¬ 
apy  and  helps  predict  those  at  danger  of  dying  from  the 
disease.  Since  prostate  cancer  is  multi-focal  and  the  disease 
quite  variable,  two  dominant  patterns  of  epithelial  distortion 
are  selected  and  each  is  independently  graded  on  a  scale  of 
1-5.  The  grades  are  then  summed  to  provide  a  Gleason  score 
ranging  from  2  (low  grade  cancer)  to  10  (maximum  danger 
cancer).  This  scale  has  been  widely  used  since  its  creation 
in  the  1960s  and  currently  forms  the  clinical  standard  of 
practice.  Manual  Gleason  scoring,  however,  has  severe  lim¬ 
itations. 

2.2  Limitations  of  Current  Practice 

Widespread  screening  for  prostate  cancer  has  resulted  in 
a  large  workload  of  biopsied  men  [16],  placing  an  increasing 
demand  on  services.  Operator  fatigue  is  well-documented 
and  guidelines  limit  the  workload  and  rate  of  examination 
of  samples  by  a  single  operator  (examination  speed  and 
throughput).  Importantly,  inter-  and  intra-pathologist  vari¬ 
ation  complicates  decision-making.  The  consistency  in  de¬ 
termining  Gleason  scores  is  rather  poor.  Intra-observer  mea¬ 
surements  show  that  a  pathologist  confirms  their  own  score 
less  than  50%  of  the  time  and  are  ±1  score  no  more  than 
80%  of  cases  [2].  Hence,  the  diagnoses  for  ~  50%  of  cases 
may  change  and  may  be  significantly  altered  for  ~  20%  of 
cases  ultimately  leading  to  changes  in  therapy  for  a  patient 
subset  [30].  The  numbers  are  decidedly  cause  for  concern. 
For  example,  a  recent  study  including  15  pathologists  and 
537  prostate  cancer  patients,  70.8%  of  Gleason  scores  were 
shown  to  be  inaccurate  when  compared  with  the  patient’s 
final  outcome  [18].  Second  opinions  [29]  improve  assessment 
and  are  cost-effective  [10],  not  to  mention  their  utility  in  mit- 
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igating  the  effects  of  healthcare  costs,  lost  wages,  morbidity, 
or  potential  litigation.  In  summary,  the  manual  recognition 
of  spatial  patterns  leaves  much  to  be  desired  from  a  process 
perspective  and  has  far-reaching  social  effects  from  a  public 
health  perspective. 

For  the  reasons  underlined  above,  there  is  an  urgent  need 
for  high-throughput,  automated  and  objective  pathology 
tools.  We  believe  that  this  need  is  best  met  by  employing 
the  power  of  computer  algorithms  and  advanced  processing 
to  address  prostate  cancer  diagnosis  and  grading. 

The  information  content  of  conventionally  stained  images 
is  limited,  inherently  non-specific  and  varies  greatly  within 
patient  populations  and  processing  conditions.  Hence,  the 
information  derived  from  visible  microscopy  images  is  fun¬ 
damentally  limited  and  automated  methods  of  analyzing 
stained  images  have  failed  to  provide  a  sufficiently  robust  al¬ 
gorithm  to  diagnose  disease.  An  alternative  to  morphology- 
based  microscopy  are  molecular  microscopy  techniques  to 
probe  disease.  Molecular  technologies  for  disease  diagnosis 
are  an  exciting  venue  for  investigations  as  they  promise  bet¬ 
ter  diagnostic  capabilities  through  objective  means  and  a 
multitude  of  chemicals  to  provide  insight  into  the  changes 
indicative  of  the  disease  process.  In  particular,  spec¬ 
troscopy  tools  allow  for  the  measurement  of  many  molecular 
species  simultaneously.  Spectroscopic  techniques  in  imaging 
form,  notably  using  optics,  further  enable  the  analysis  to 
be  conducted  without  perturbing  the  tissue  [11],  In  this 
manuscript,  we  present  the  analysis  of  prostate  tissue  with 
one  such  technique,  Fourier  transform  infrared  (FTIR)  spec¬ 
troscopic  imaging. 

2.3  Molecular  Imaging 

Infrared  spectroscopy  is  a  classical  technique  for  measur¬ 
ing  the  chemical  composition  of  specimens.  At  specific  fre¬ 
quencies,  the  vibrational  modes  of  molecules  are  resonant 
with  the  frequency  of  infrared  light.  By  monitoring  all  fre¬ 
quencies  in  the  region,  a  pattern  of  absorption  can  be  cre¬ 
ated.  This  pattern,  or  spectrum,  is  characteristic  of  the 
chemical  composition  and  is  hypothesized  to  contain  infor¬ 
mation  that  will  help  determine  the  cell  type  and  disease 
state  of  the  tissue.  Recently,  FTIR  spectroscopy  has  been 
developed  in  an  imaging  sense.  Hence,  The  data  are  similar 
to  optical  microscopy.  The  first  difference  is  that  no  external 
dyes  are  needed  and  the  contrast  in  images  can  be  directly 
obtained  from  the  chemical  composition  of  the  tissue.  The 
second  is  that  each  pixel  in  the  visible  image  contains  RGB 
values  but  in  IR  imaging  contains  several  thousand  values 
across  a  bandwidth  (2000  —  14000nm)  that  is  ~  40  times 
larger  than  the  visible  spectrum  (400  —  700nm)  [7] . 

3.  DATA  AND  METHODOLOGY 
3.1  Experimental  Details 

Prostate  tissues  were  obtained  from  Cooperative  Hu¬ 
man  Tissue  Network  for  the  tissue  array  research  program 
(TARP)  laboratory.  Using  these  tissues,  tissue  microarrays 
were  prepared  using  a  Beecher  automated  tissue  arrayer  con¬ 
taining  a  video  overlap  system  and  0.6 mm  needles.  Appro¬ 
priate  institutional  review  board  and  National  Institutes  of 
Health  (USA)  guidelines  for  the  protection  of  human  sub¬ 
jects  were  followed.  5 /im  sections  of  tissue  were  floated  on  an 
infrared  transmissive  optical  window  for  FTIR  spectroscopic 
imaging.  Another  5 pm  section  obtained  from  the  same  point 
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Figure  1:  Conventional  Staining  and  Automated 
Recognition  by  Chemical  Imaging.  (A)  Typical 
H&E  stained  sample,  in  which  structures  are  de¬ 
duced  from  experience  by  a  human.  Highlights  of 
specific  regions  in  the  manner  of  H&E  is  possible 
using  FTIR  imaging  without  stains.  (B)  Absorp¬ 
tion  at  1080  cm-1  commonly  attributed  to  nucleic 
acids  and  (C)  to  proteins  of  the  stroma.  The  data 
obtained  is  3  dimensional  (D)  from  which  spectra 
(E)  or  images  at  specific  spectral  features  may  be 
plotted. 


on  the  tissue  specimen  was  observed  using  traditional  mi¬ 
croscopy  for  comparison.  Expert  pathologists  determined 
the  tissue  classification  using  these  microscopy  samples  by 
staining  with  H&cE.  Pathologists’  classification  were  used 
as  the  ‘gold  standard’  for  comparison  with  the  results  from 
the  methods  mentioned  in  this  paper. 

Tissues  were  analyzed  using  a  Michelson  interferometer 
attached  to  a  microscope  (Perkin-Elmer  Spotlight  300)  in 
transmission  mode  at  a  resolution  of  4cm_1  The  sample 
was  then  raster  scanned  to  obtain  images  of  the  entire  spec¬ 
imen.  Typical  specimen  size  is  600 /im  x  600 /im  with  each 
pixel  being  6.25 fim  x  6.25 /im  on  the  sample  plane.  Spectra 
are  composed  of  1,641  sample  points  of  the  spectral  range 
4,000  —  720cm~1.  Data  acquisition  using  these  techniques 
required  40  minutes  per  cylindrical  core  of  the  tissue  mi¬ 
croarray  to  yield  a  root  mean  square  signal  to  noise  ratio  of 
500  :  1.  A  typical  array  was  composed  of  approximately  2.5 
million  pixels  and  required  40  GB  of  storage  space. 

The  data  obtained  from  FTIR  imaging  is  three- 
dimensional.  The  x —  and  y— dimensions  locate  pixels  on 
the  tissue-sample  plane.  The  ^-dimension  values  compose 
the  IR  spectrum  for  the  corresponding  pixel.  The  spectra 
can  be  analyzed  to  determine  what  type  of  tissue  (epithe¬ 
lium,  stroma,  or  muscle)  the  specimen  is  as  well  as  whether 
the  tissue  is  malignant  or  benign.  We  have  developed  this 
technology  to  provide  data  from  tissue  in  minutes  and  em¬ 
ploy  a  high-throughput  sampling  strategy  using  Tissue  Mi¬ 
croarrays  (TMA)  to  obtain  data.  [19]  Samples  from  multiple 
tissues,  from  multiple  patients  and  multiple  clinical  settings 
are  included  in  the  data  set  to  maximize  the  sampling  of 
natural  variability  and  ensure  the  development  of  robust 
analysis  algorithms.  These  high-throughput  imaging  and 
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microarray  technologies  combine  to  provide  very  large  data 
sets — see  Figure  1.  A  typical  single  core  consists  of  300  x  300 
pixels  on  the  x  —  y  plane  with  1641  bands  on  the  2-axis.  A 
tissue  microarray  consists  of  several  hundred  such  cores  and 
analysis  of  such  large  datasets  (typically,  tens  of  GB)  is  com¬ 
putationally  expensive. 

3.2  Data  Format 

Each  pixel’s  2-dimension  contains  a  spectrum  character¬ 
istic  of  the  chemical  composition  of  that  region  of  the  speci¬ 
men.  Certain  spectral  quantities  provide  measures  of  chem¬ 
istry.  For  example,  the  height  of  each  feature  is  propor¬ 
tional  to  its  abundance,  the  peak  position  is  associated  with 
the  vibrational  identity  and  peak  shape  often  reflects  the 
multitude  of  environments  around  the  molecule.  Therefore, 
differences  in  spectral  characteristics  can  be  used  in  classifi¬ 
cation  and  these  exact  spectral  features  are  termed  ‘metrics’. 
For  example,  the  ratio  of  absorbance  of  the  spectral  peak  at 
1080cm-1  to  the  spectral  peak  at  1545cm-1  is  commonly 
used  to  distinguish  epithelial  from  stromal  cells.  Trained 
spectroscopists  determine  these  metrics  based  upon  exam¬ 
ination  of  spectral  patterns.  Hence,  the  reduction  of  ull 
spectra  to  descriptive  metrics  forms  an  intelligent  dimen¬ 
sionality  reduction  strategy.  Genetic  algorithms  form  de¬ 
cision  rules  based  upon  these  metrics  to  classify  pixels  by 
tissue  type.  Furthermore,  the  transparency  of  the  genetic 
algorithms  allows  the  scientist  to  correlate  specific  rules  to 
biological  features  (tissue  type  and  cancer  classification)  via 
metrics  based  upon  spectral  characteristics. 

4.  APPROACH 

In  this  section  we  review  related  work  on  the  GBML  com¬ 
munity,  highlighting  previous  efforts  to  deal  with  large  data 
sets.  We  also  present  the  motivation  and  techniques  that 
lead  to  the  design  of  NAX.  Special  attention  is  paid  to  the 
description  of  the  hardware  and  software  techniques  used, 
as  well  as  to  the  design  of  a  scalable  GBML  algorithm. 

4.1  Related  Background 

Bernado,  Llora  &  Garrell  [6]  presented  a  first  empir¬ 
ical  comparison  between  genetics-based  machine  learning 
techniques  (GBML)  and  traditional  machine  learning  ap¬ 
proached.  The  authors  reported  that  GBML  techniques 
were  able  to  perform  as  well  as  traditional  techniques.  Later 
on,  Bacardit  &  Butz  [3]  repeated  the  analysis  again  obtain¬ 
ing  similar  results.  Most  of  the  experiments  presented  on 
both  papers  were  conducted  using  publicly  available  data 
sets  provided  by  the  University  of  California  at  Irvine  repos¬ 
itory  [28].  Most  of  the  data  sets  are  defined  over  tens  of 
features  and  up  to  few  thousands  of  records.  However,  a 
key  property  of  GBML  approaches  is  its  intrinsic  massive 
parallelism  and  scalability  properties.  Cantu-Paz  [8]  pre¬ 
sented  how  efficient  and  accurate  genetics  algorithms  could 
be  assembled,  and  Llora  [21]  presented  how  such  algorithms 
can  be  efficiently  used  as  machine  learning  and  data  mining 
techniques. 

GBML  techniques  require  evaluating  candidate  solutions 
against  the  original  data  set  matching  the  candidate  solu¬ 
tions  (e.g.  rules,  decision  trees,  prototypes)  against  all  the 
instances  in  the  data  set.  Regardless  of  the  GBML  flavor 
used,  Llora  &  Sastry  [25]  showed  that  as  the  problem  grows, 
the  matching  process  governs  the  execution  time.  For  small 
data  sets  (teens  of  attributes  and  few  thousands  of  records) 


the  matching  process  takes  more  than  85%  of  the  overall 
execution  time  marginalizing  the  contribution  of  the  other 
genetic  operators.  This  number  easily  passes  99%  when  we 
move  to  data  sets  with  few  hundreds  of  attributes  and  few 
hundred  thousands  of  records.  Such  results  emphasize  one 
unique  facet  of  GBML  approaches:  scalability  via  exploiting 
massive  parallelism.  More  than  99%  of  the  time  required  is 
spent  on  evaluated  candidate  solutions.  Each  solution  evalu¬ 
ation  is  independent  of  each  other  and,  hence,  it  can  be  com¬ 
puted  in  parallel.  Moreover,  the  evaluation  process  can  also 
be  parallelized  further  on  large  data  sets  by  splitting  and 
distributing  the  data  across  the  computational  resources. 
A  detailed  description  of  the  parallelization  alternatives  of 
GBML  techniques  can  be  found  elsewhere  [21], 

Currently  available  off-the-shelf  GBML  methods  and  soft¬ 
ware  distributions  [5,  20]  do  not  usually  target  dealing 
with  very  large  data  sets.  Three  different  works  need  to 
be  mentioned  here.  Flockhart  [12]  proposed  and  imple¬ 
mented  GA- MINER,  one  of  the  earliest  effort  to  create  data 
mining  systems  based  on  GBML  systems  that  scale  across 
symmetric  multi-processors  and  massively  parallel  multi¬ 
processors.  The  work  review  different  encoding  and  par¬ 
allelization  schemes  and  conducted  proper  scalability  stud¬ 
ies.  Llora  [21]  explored  how  fine-grained  parallel  genetic 
algorithms  could  become  efficient  models  for  data  mining. 
Theoretical  analysis  of  performance  and  scalability  were  de¬ 
veloped  and  validated  with  proper  simulations.  Recently, 
Llora  &  Sastry  [25]  explored  how  current  hardware  can  be 
efficiently  used  to  speed  up  the  required  matching  of  so¬ 
lutions  against  the  data  set.  These  three  approaches  are 
the  basis  of  the  incremental  rule  learning  proposed  in  the 
next  section  to  approach  very  large  data  sets — such  as  the 
prostate  tissue  classification  one. 

4.2  The  Road  to  Tractability 

NAX  evolves,  one  at  a  time,  maximally  general  and  max¬ 
imally  accurate  rules.  Then,  the  covered  instance  are  re¬ 
moved  and  another  rule  is  added  to  the  previously  stored 
one,  forming  a  decision  list.  This  process  continues  until 
no  uncovered  instances  are  left.  Llora,  Sastry  &  Goldberg 

[26]  showed  that  maximally  general  and  maximally  accu¬ 
rate  rules  [32]  could  also  be  evolved  using  Pittsburgh-style 
learning  classifier  systems.  Later,  Llora,  Sastry  &  Goldberg 

[27]  showed  that  competent  genetic  algorithms  [15]  evolve 
such  rules  quickly,  reliably,  and  accurately.  From  these  early 
works,  it  can  be  inferred  that  approaching  real-world  prob¬ 
lems,  such  as  the  prostate  tissue  classification  and  cancer 
diagnosis,  using  GBML  techniques  may  produce  the  desired 
byproduct:  proper  scalability.  We  discuss  next  efficient  im¬ 
plementation  techniques  to  deal  with  very  large  data  sets 
using  NAX  [24], 

4.3  Exploiting  the  Hardware 

Recently,  multimedia  and  scientific  applications  have 
pushed  CPU  manufactures  to  include  support  for  vector 
instruction  sets  again  in  their  processors.  Both  applica¬ 
tions  areas  require  heavy  calculations  based  on  vector  arith¬ 
metic.  Simple  vector  operations  such  as  add  or  product  are 
repeated  over  and  over.  During  80s  and  90s  supercomput¬ 
ers,  such  as  Cray  machines,  were  able  to  issue  hardware 
instructions  that  took  care  of  basic  vector  operations.  A 
more  constrained  scheme,  however,  has  made  its  way  into 
general-purpose  processors  thanks  to  the  push  of  multime- 
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Figure  2:  This  figure  illustrates  the  parallel  model  implemented.  Each  processor  is  running  an  identical  NAX 
algorithm.  They  only  differ  in  the  portion  of  the  population  being  evaluated.  The  population  is  treated 
as  collection  of  chunks  where  each  processor  evaluates  its  own  assigned  chunk  sharing  the  fitness  of  these 
individuals  with  the  rest  of  processors.  This  approach  minimizes  communication  cost. 


dia  and  scientific  applications.  Main  chip  manufactures — 
IBM,  Intel,  and  AMD — have  introduced  vector  instruction 
sets  -Altivec,  SSE3,  and  3DNow+  —that  allow  performing 
vector  operations  over  packs  of  128  bits  by  hardware.  We 
will  focus  on  a  subset  of  instructions  that  are  able  to  deal 
with  floating  point  vectors.  This  subset  of  instructions  to 
implemented  by  hardware  vector  operations  against  groups 
of  four  floating-point  numbers.  These  instructions  are  the 
basis  of  the  fast  rule  matching  mechanism  proposed. 

Our  set  of  rules  seek  both  to  correctly  classify  the  prostate 
data  set  and  provide  biological  insight  into  the  rules.  All  the 
attributes  of  the  domain  are  real- value  and  the  conditions  of 
the  rules  need  to  be  able  to  express  conditions  in  a  SR™  spaces. 
We  use  a  rule  encoding  similar  to  the  one  proposed  by  Wil¬ 
son  [33]  and  widely  used  in  the  GBML  community.  Rules 
express  the  conjunction  of  tests  across  attributes.  Each  test 
can  be  defined  in  multiple  fashions,  but  without  loss  of  gen¬ 
erality,  we  pick  a  simple  interval  based  one.  A  simple  exam¬ 
ple  of  and  if-then  rule,  could  be  expressed  as  follows: 

1.0  <  a0  <  2.3  A  •  ••  A  10.0  <  an  <  23  ->  ci  (1) 

Where  the  condition  is  the  conjunction  of  the  different  at¬ 
tribute  tests,  as  introduced  earlier,  and  the  condition  is  the 
predicting  class.  We  also  allow  a  special  condition — don’t 
care — which  always  returns  true  to  allow  generalized  to 
rules  evolve.  The  rule  below  illustrates  an  example  of  a 
generalized  rule. 

1.0  <o0  <2.3  A -3.0  <a3  <  2 — >  ci  (2) 

All  attributes  except  do  and  were  marked  as  don’t  care. 

Matching  a  rule  requires  performing  the  individual  tests 
before  the  final  and  condition  can  be  computed.  Vector 
instruction  sets  can  help  improve  the  performance  of  this 
process  by  performing  four  tests  at  once.  Actually,  this  pro¬ 
cess  can  be  regarded  as  four  parallel  running  pipelines.  The 


process  can  be  improved  further  by  stopping  the  matching 
process  when  any  one  test  fails.  The  code  implemented  as¬ 
sumes  that  the  two  vectors  containing  the  upper  and  lower 
bounds  are  provided  and  records  are  stored  in  a  two  dimen¬ 
sional  matrix.  As  also  shown  elsewhere  [25],  exploiting  the 
hardware  available  can  speed  between  3  and  3.5  times  the 
matching  process  [24]. 

4.4  Massive  Parallelism 

Since  most  of  the  time  is  spent  on  the  evaluation  of  candi¬ 
date  rules  when  dealing  with  large  data  sets,  our  next  goal 
was  to  find  a  parallelization  model  that  could  take  advantage 
of  this  feature.  Due  to  the  embarrassing  parallelism  model 
[17]  for  rule  evaluation,  we  designed  a  coarse-grain  parallel 
model  for  distributing  the  evaluation  load.  Cantu-Paz  [8] 
proposed  several  schemes,  showing  the  importance  of  the 
trade  off  between  computation  time  and  time  spent  commu¬ 
nicating.  When  designing  the  parallel  model,  we  focused  on 
minimizing  the  communication  cost.  Usually,  a  feasible  so¬ 
lution  could  be  a  master/slave  one — the  computation  time  is 
much  larger  than  the  communication  one.  However,  GBML 
approaches  tend  to  use  rather  large  populations,  forcing  us 
to  send  rules  to  the  evaluation  slaves  and  collect  the  resulting 
fitness.  This  scheme  also  increments  sequential  instructions 
that  cannot  be  parallelized,  reducing  the  overall  speedup  of 
the  parallel  implementation  as  a  result  of  Ambdhals  law  [1] . 

To  minimize  communication  cost,  each  processor  runs 
identical  NAX  algorithms — all  seeded  in  the  same  manner, 
and,  hence  performing  the  same  genetic  operations.  They 
only  differ  in  the  portion  of  the  population  being  evaluated. 
Thus,  the  population  is  treated  as  collection  of  chunks  where 
each  processor  evaluates  its  own  assigned  chunk,  sharing  the 
fitness  of  the  individuals  in  its  chunk  with  the  rest  of  proces¬ 
sors.  in  this  manner  fitness  can  be  encapsulated  and  broad¬ 
casted,  maximizing  the  occupation  of  the  underlying  pack- 
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Figure  3:  This  figure  on  the  left-hand  side  presents  the  original  labeled  data  contained  in  the  P80  array.  The 
figure  on  the  right-hand  side  presents  the  reconstructed  image  based  on  the  predictions  issued  by  the  the 
rule  set  evolved  by  NAX.  Green  represent  non  cancerous  tissue  spots;  red  represent  malignant  tissue  spots. 


ing  frames  used  by  the  network  infrastructure.  Moreover, 
this  approach  also  removes  the  need  for  sending  the  actual 
rules  back  and  forth  between  processors — as  a  master/slave 
approach  would  require — thus,  maintaining  the  communi¬ 
cation  to  the  bare  minimum — namely,  the  fitness.  Figure  2 
presents  a  conceptual  scheme  of  the  parallel  architecture  of 
NAX. 

To  implement  the  model  presented  in  Figure  2,  we  used 
C  and  the  open  message  passing  interface  (openMPI)  imple¬ 
mentation  [13].  Each  processor  computes  which  individuals 
are  assigned  to  it.  Then  it  computes  the  fitness  and,  finally, 
it  broadcasts  the  computed  fitness.  The  rest  of  the  process 
is  unchanged.  Except  for  the  cooperative  evaluation,  all  the 
processors  generate  the  same  evolutionary  trace. 

4.5  Lists  of  Maximally  General  and 
Maximally  Accurate  Rules 

One  main  characteristic  of  the  so-called  Pittsburgh-style 
learning  classifier  systems — a  particular  type  of  GBML — is 
that  the  individuals  encode  a  rule  set  [14,  22,  15].  Thus 
evolutionary  mechanisms  directly  recombine  one  rule  set 
against  another  one.  For  classification  tasks  of  moderate 
complexity,  the  rule  sets  are  not  large.  For  complex  prob¬ 
lems,  however,  the  potential  number  of  rules  required  to 
ensure  accurate  classification  may  use  prohibitively  large 
amounts  of  memory.  The  requirements  increase  even  fur¬ 
ther  in  the  presence  of  noise  [23].  Hence,  this  family  of 
GBML  techniques  works  very  well  on  moderate  complexity 
problems  [6,  3],  but  needs  to  be  modified  for  complex  and 
large  data  sets. 

A  sequential  rule  learning  approach  may  alleviate  the  re¬ 


quirements  by  evolving  only  one  rule  at  a  time,  hence,  reduc¬ 
ing  the  memory  requirements  [9,  4].  This  allows  maintaining 
relatively  small  memory  footprints  that  makes  feasible  pro¬ 
cessing  large  data  sets.  However,  an  incremental  approach 
to  the  construction  of  the  rule  set  requires  paying  special 
attention  to  the  way  rules  are  evolved.  For  each  run  of  the 
genetic  algorithm,  we  would  like  to  obtain  a  maximally  gen¬ 
eral  and  maximally  accurate  rule,  that  is,  a  rule  that  covers 
the  maximum  number  of  examples  without  making  mistakes 
[32].  NAX  (our  proposed  incremental  rule  learner)  evolves 
maximally  general  and  maximally  accurate  rules  by  com¬ 
puting  the  accuracy  (a)  and  the  error  (e)  of  a  rule  [26] .  In  a 
Pittsburgh-style  classifier,  the  accuracy  may  be  computed  as 
the  proportion  of  overall  examples  correctly  classified,  and 
the  error  is  the  proportion  of  incorrect  classifications  issued. 
Once  the  accuracy  and  error  of  a  rule  are  known,  the  fitness 
can  be  computed  as  follows. 

fix)  =  a(r)  ■  e(r)7  (3) 

where  7  is  the  error  penalization  coefficient.  We  have  set  7 
to  18  to  guarantee  that  the  evolutionary  process  will  pro¬ 
duce  maximally  general  and  maximally  accurate  solutions. 
Further  details  may  be  found  elsewhere  [24].  The  above 
fitness  measure  favors  rules  with  a  good  classification  accu¬ 
racy  and  a  low  error,  or  maximally  general  and  maximally 
accurate  rules.  By  increasing  7,  we  can  bias  the  search  to¬ 
wards  correct  rules.  This  is  an  important  element  because 
assembling  a  rule  set  based  on  accurate  rules  guarantees  the 
overall  performance  of  the  assembled  rule  set.  NAX’s  efficient 
implementation  of  the  evolutionary  process  is  based  on  the 
techniques  described  using  hardware  acceleration — section 
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4.3 — and  coarse-grain  parallelism — section  4.4.  The  genetic 
algorithm  used  was  a  modified  version  of  the  simple  genetic 
algorithm  [14]  using  tournament  selection  (s  =  4),  one  point 
crossover,  and  mutation  based  on  generating  new  random 
boundary  elements. 

5.  RESULTS 

NAX  has  shown  competitiveness  in  evolving  rule  sets  that 
perform  as  accurately  as  the  ones  evolved  by  other  genetics- 
based  machine  learning  and  non-evolutionary  machine  learn¬ 
ing  techniques.  However,  NAXs  key  element  is  the  ability  to 
deal  with  large  data  sets.  In  this  paper,  we  present  prelim¬ 
inary  results  towards  evolving  a  model  capable  of  correctly 
classifying  pixels  as  cancerous  or  non-cancerous.  The  origi¬ 
nal  array  of  spots  is  presented  in  figure  3(a).  Each  spot  cor¬ 
responds  to  a  different  biopsy  sample  from  a  patient.  The 
pixels  present  in  each  spot  correspond  to  the  epithelial  tis¬ 
sue  of  the  biopsy,  we  supress  all  other  tissue  types  with 
a  prior  classification  filter  based  on  Bayesian  Likelihood.  [7] 
Each  pixel  of  a  spot  is  defined  by  93  different  metrics  ex¬ 
tracted  from  the  processed  infrared  spectra — as  described 
in  section  3.  Finally,  each  pixel  in  the  array  was  labeled 
with  the  diagnostic  class  provided  by  a  human  pathologist. 
Figure  3(a)  presents  in  green  all  the  non-cancerous  pixels 
while  red  identifies  cancerous  ones. 

Our  goal  with  the  initial  experiments  here  was  to  demon¬ 
strate  the  usefulness  of  the  proposed  approach  to  computer- 
aided  diagnosis.  Our  current  experimental  efforts  are  plan¬ 
ning  mass  experimentation  on  several  tissue  arrays  using  the 
Tungsten  cluster  at  the  National  Center  for  Supercomput¬ 
ing  Applications.  These  initial  experiments  were  conducted 
on  a  dual  core  Intel  Xeon  2.8GHz  Linux  computer  with  1Gb 
of  RAM.  NAX  was  run  using  both  processors.  The  training 
time  to  obtain  a  model  describing  all  the  data  took  less  than 
ten  hours — indicating  that  very  competitive  training  times 
can  be  achieved  by  just  using  more  processors.  The  ob¬ 
tained  model  was  able  to  correctly  classify  >  99.99%  of  the 
training  pixels  correctly.  However,  these  results  do  not  illus¬ 
trate  the  generalization  capabilities  of  the  models  evolved 
by  NAX.  Hence,  we  ran  a  series  of  ten-fold  stratified  cross- 
validation  runs  [34]  to  measure  generalization  and  test  per¬ 
formance  of  the  evolved  models.  It  is  important  to  mention 
that  tools  such  as  WEKA  [34]  and  other  off-the-shelf  data 
miners  were  not  able  to  handle  the  volume  of  data  required 
to  evolve  a  model —  either  due  to  the  large  memory  foot¬ 
print  required  or  by  not  being  able  to  provide  an  accurate 
model  in  a  feasible  time  period.  The  results  of  the  cross- 
validation  experiments  using  NAX  correctly  classified  87.34% 
of  validation  pixels.  Such  results  are  more  than  encouraging, 
because  they  show  a  human-competitive  computer-aided  di¬ 
agnosis  system  is  possible.  Another  interesting  property  is 
that  a  few  rules  classify  a  large  number  of  pixels — see  Fig¬ 
ure  4.  Such  a  result  is  interesting  for  the  interpretability 
of  the  model,  since  a  small  number  of  rules  have  a  great 
expressiveness,  and  hence  may  provide  valuable  biological 
insight.  Most  importantly,  they  allow  us  to  classify  tissue 
accurately.  Subsequent  to  this  pixel  level  classification,  each 
circular  spot  in  figure  3  was  assigned  as  malignant  or  benign 
based  on  the  majority  of  pixels  of  he  class  in  the  sample.  We 
were  able  to  accurately  classify  68  of  69  malignant  spots  and 
70  of  71  benign  spots  in  this  manner.  While  human  accu¬ 
racy  is  difficult  to  quantify  due  to  the  variation  between 
persons, a  generally  accepted  anecdotal  figure  is  about  5% 


Figure  4:  Performance  of  the  evolved  model  as  a 
function  of  the  number  of  rules  used. 

error  rates.  The  preliminary  results  we  demonstrate  here 
could  potentially  reduce  that  five-fold  to  about  1%,  provid¬ 
ing  a  solution  to  this  real-world  problem  by  a  combination 
of  novel  spectroscopy  and  advanced  machine  learning. 

6.  CONCLUSION 

In  this  manuscript,  we  present  the  application  of  advanced 
genetics-based  machine  learning  algorithms  to  a  real-world 
problem  of  large  scope,  namely,  the  diagnosis  of  prostate 
cancer.  As  opposed  to  subjective  human  recognition  of  dis¬ 
ease  in  tissue  using  light  microscopy,  we  employed  a  chemical 
microscopy  approach  that  required  extensive  computation 
but  provided  a  decision  without  human  input.  Our  devel¬ 
opment  of  a  learning  algorithm  based  on  maximally  general 
and  maximally  accurate  rules  was  scalable  to  very  large  data 
sets  and  parallelized  to  provide  learning  and  classification 
speed  advantages.  The  algorithm  was  able  to  classify  a  ma¬ 
jority  of  pixels  correctly,  resulting  in  overall  error  rates  that 
were  comparable  to  human  examination,  the  current  gold 
standard  of  care. 
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INTRODUCTION 

The  integration  of  FTIR  spectroscopy  with  microscopy  facilitates  recording  of  spatially  resolved 
spectral  infonnation,  allowing  the  examination  of  both  the  structure  and  chemical  composition  of 
a  heterogeneous  material.  While  the  first  such  attempt  was  over  50  years  ago,1  present  day 
instrumentation  largely  evolved  from  the  point  microscopy  detection  of  interferometric  signals 
that  developed  in  the  mid-80s.  The  successful  coupling  of  interferometry  for  spectral  recording 
and  microscopy  for  spatial  specificity  in  these  systems  spurred  interest  in  a  variety  of  fields, 
including  the  materials/  forensic4  and  biomedical  arenas.5,  6  Point  microscopy  utilizes  an 
aperture  to  restrict  radiation  incident  on  a  sample  and  permits  the  recording  of  spatially  localized 
data.  The  primary  utilities  of  this  form  of  microscopy  lay  in  acquiring  accurate  spectra  from 
small-size  samples,  in  determining  the  chemical  structure  and  composition  of  heterogeneous 
phases  at  specified  points  and  in  building  a  two-dimensional  map  of  the  chemical  composition  of 
samples.  Since  the  data  were  acquired  at  a  single  point,  composition  maps  could  only  be 
acquired  by  rastering  the  sample.  Hence,  the  approach  was  termed  mapping  or  point  mapping 
and  involved  as  many  spectral  scans  as  the  number  of  pixels  in  the  map. 

The  use  of  focal  plane  array  (FPA)  detectors  for  microscopy  ’  allowed  for  the  acquisition  of 
large  fields  of  view  in  a  single  interferogram  acquisition  sweep.  The  multichannel  detection 
enabled  by  array  detectors  was  similar  to  the  concept  of  recording  images  with  charge  coupled 
devices  in  optical  microscopy;  hence,  the  approach  was  termed  imaging.  The  unique  advantages 
of  observing  an  entire  field  of  view  rapidly  permitted  applications  that  allowed  monitoring  of 
dynamic  processes,  spatially  resolved  spectroscopy  of  large  samples  or  many  samples  and 
enhancement  of  spatial  resolution  due  to  retention  of  radiation  throughput  that  was  lost  in  point 
microscopy  systems  due  to  diffraction  at  the  aperture.  Just  as  for  the  previous  generation  of 
microspectroscopy  instruments,  applications  rapidly  followed  in  the  materials9  and  biomedical 
fields.10'14  Research  activity  in  this  area  can  be  divided  into  three  major  categories: 
instrumentation  and  sampling  methodologies,  applications  and  data  extraction  algorithms.  In  this 
manuscript,  we  review  key  advances  and  recent  developments  in  the  context  of  biomedical 
imaging.  We  do  not  provide  comprehensive  overview  but  selectively  highlight  certain  features  of 
importance  for  cancer-related  imaging.  Last,  we  focus  on  one  emerging  application  area,  namely 
tissue  histopathology,  and  provide  illustrative  examples  from  our  laboratory  indicating  the 
integrative  nature  of  the  three  in  developing  protocols. 
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INSTRUMENTATION,  SAMPLING  AND  DATA  HANDLING  TECHNIQUES 


Instrumentation 

Since  imaging  is  largely  based  on  new  detectors  with  unique  performance  characteristics  for 
spectroscopy,  efforts  in  instrumentation  have  largely  focused  on  the  efficient  integration  of  FPA 
detectors  with  interferometers.  Due  to  the  size,  different  electronics  and  unique  noise 
characteristics  of  FPAs,  an  optimization  of  data  acquisition  methodology  was  a  primary  activity 
in  the  initial  time  period  of  availability  of  instrumentation.  The  first  rational  attempt  at 
understanding  performance  and  optimizing  the  data  acquisition  process  revealed  the  unique  noise 
characteristics  that  limited  the  first  generation  of  array  detectors.15  Briefly,  this  paper  established 
that  the  general  behavior  of  FTIR  spectrometers  is  generally  held  for  imaging  spectrometers  but 
the  detector  may  serve  to  limit  the  applicability  of  established  practices  in  IR  spectrometry.  An 
explicit  optimization  of  the  data  acquisition  time  revealed  several  strategies  for  speeding  data 
collection  for  both  the  step  scan  and  rapid  scan  mode.16  The  first  example  of  rapid-scan  FTIR 
imaging  17was  conducted  using  asynchronous  sampling,  followed  by  descriptions  of 
synchronously  triggered  sampling  and  generalized  methodologies  that  could  use  any  detector  at 
any  modulation  frequency  using  post-acquisition  techniques.  Advances  in  detector  technology 
have  now  allowed  for  rapid  scan  imaging  to  become  routine  for  large  FPA  detectors,  while 
innovative  new  detectors  have  been  developed  (first  by  PerkinElmer)  that  trade  off  a  large 
multichannel  detection  advantage  of  arrays  against  the  speed  of  smaller  detector  arrays  to 
provide  a  very  high  performance  instrument.19 

At  present,  rapid  scan  imaging  has  become  the  mode  of  choice  for  most  manufacturers  and 
detector  sizes  have  proliferated  from  the  classic  64  x  64  format  to  range  from  16  x  1  to  256  x  256 
formats  (see  figure  1).  While  the  smaller  detectors  require  rastering  to  image  most  samples  and 
can  provide  data  of  higher  quality  more  efficiently,  larger  detectors  are  generally  employed  for 
their  large  field  of  view  and  are  useful  for  studying  dynamics.  It  is  interesting  to  note  that  the 
linear  array  approach  has  an  entirely  different  detector  technology  and  considerations  for 
electronics  compared  to  the  two-dimensional  FPAs.  While  it  is  beyond  the  scope  of  this  article  to 
discuss  the  differences,  the  use  of  “macro”  electronics  that  are  offset  from  the  actual  detector  and 
AC  mode  of  operation  are  the  two  major  differences  that  affect  data.  Consequently,  comparisons 
in  performance  are  slightly  more  complicated.  On  the  large  format  FPA  front,  the  latest  advance 
seems  to  be  a  detector  developed  jointly  by  NIH  and  FBI  personnel  in  2005.  The  detector  can 
operate  at  16  KHz  for  128  x  128  pixel  snaps  ( Bhargava ,  Levin,  Perlman  and  Bartick, 
Unpublished).  This  is  in  the  speed  regime  of  single  element  detectors.  Hence,  the  development 
can  truly  lead  to  the  acquisition  of  an  entire  image  in  a  single  interferometer  mirror  sweep  in  the 
same  time  that  it  takes  to  acquire  1  spectrum  with  a  benchtop  IR  spectrometer.  To  handle  the 
large  data  output,  we  designed  on-chip  co-addition  and  various  corrections.  We  believe  that 
similar  detector  systems,  operating  in  a  fast  regime  and  integrating  processing  with  electronics, 
are  likely  to  be  the  technology  of  tomorrow  for  FTIR  imaging. 

The  wide  variety  of  instrumentation  makes  comparisons  difficult,  especially  when  manufacturers 
provide  different  specifications  for  instruments.  We  have  proposed  a  comparison  index  for  these 
systems  based  on  performance  per  unit  time.  Recognizing  that  spectral  resolution,  time  for 
scanning,  data  processing  (e.g.  apodization)  and  resultant  image  size  are  the  primary 
determinants  of  performance,  a  measure  can  be  formulated  to  describe  performance.  For  a  fixed 
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data  processing  scheme  (filtering,  apodization  etc.),  the  time  taken  to  acquire  1  megapixels  of 
data  for  8  cm'1  resolution  at  a  signal  to  noise  ratio  (SNR)  of  1000:1  is  found  to  be  a  good 
measure.  We  would  like  to  emphasize  that  the  performance  is  the  performance  of  the  entire 
imaging  spectrometer  and  not  due  to  the  detector  alone.  Efficient  coupling  of  the  interferometer 
and  optimization  of  the  optical  train  will  both  affect  perfonnance  as  will  the  correct  setup  of  the 
experiment.  This  index  also  does  not  consider  the  ease  of  use  or  “user-friendliness”  of  systems. 
These  are  other  important  considerations  and  must  also  be  considered  by  organizations  interested 
in  FTIR  imaging  technology.  The  issue  of  time  resolution  for  acquiring  data  is  one  such  concern. 
The  first  approach  is  the  kinetics  approach  in  which  the  interferometer  is  repeatedly  scanned  and 
imaging  data  sets  are  sequentially  acquired  as  quickly  as  possible.  Clearly,  rapid  scan  is  favored 
and  the  availability  of  fast  readout  detectors  is  mandatory  for  fast  events.  The  limit  to  this 
method  is  the  readout  speed  of  the  array  (frames  in  ms)  as  interferometers  can  generally  be 
scanned  fast  enough  and  the  integration  time  required  is  typically  in  the  tens  of  microseconds 
regime.  An  example  is  shown  in  figure  2  to  demonstrate  applicability  in  monitoring 
polymerization  kinetics. 

Though  rapid  scan  imaging  has  displaced  the  step-scan  mode  in  most  new  instrumentation,  a 
very  important  application  of  the  step-scan  approach  remains  in  time-resolved  imaging.20'22 
Briefly,  the  method  is  applicable  to  systems  that  can  be  repeatedly  and  reproducibly  excited  and 
relax  back  to  their  ground  state.  At  each  mirror  retardation,  the  FPA  is  repeatedly  triggered  to 
acquire  data.  At  the  same  time,  the  sample  is  excited  once  and  the  dynamics  of  excitation  and 
decay  of  the  excited  state  are  monitored.  Mirror  stepping,  data  acquisition  and  sample  excitation 
are  all  precisely  synchronized.  Figure  3  demonstrates  the  synchronization.  Time  resolved  FTIR 
imaging  was  first  demonstrated  using  polymer-liquid  crystal  composites.  Examples  of  the  types 
of  data  that  may  be  obtained  are  also  shown  in  figure  3.  Fast,  the  technology  was  extended  to 
provide  significantly  higher  time  resolution  than  could  be  obtained  by  the  electronics  of  the 
detector  alone.23  While  FPA  detectors  are  slow  compared  to  single  point  detectors  used  in 
conventional  FTIR  spectroscopy,  the  cause  is  the  need  to  read  out  data  from  several  thousand 
pixels  and  not  from  the  need  to  record  data  from  all  pixels.  Hence,  by  staggering  the  data 
recording  time  over  multiple  sample  excitations,  higher  temporal  resolution  may  be  obtained. 
With  current  detectors,  a  time  resolution  of  ~30  ps  should  be  possible. 

Sampling 

Interferometer  Issues 

Among  the  sampling  configurations,  the  first  clearly  was  the  optimization  of  the  microscope  for 
transmission  and  sampling.  Unexpected  issues  were  encountered  in  initial  devices.  For  example, 
the  detector  for  the  mono-wavelength  laser  provides  a  fringe  pattern  to  allow  for  tracking  mirror 
retardation.  The  signal  from  this  laser  is  measured  by  a  small  detector  located  at  the  center  of  the 
beamsplitter  (to  minimize  errors)  with  an  ann  that  extends  out  to  the  edge.  When  imaged  onto 
the  FPA,  this  laser  detector  leads  to  a  pattern  with  low  signal  levels.  Hence,  the  field  of  view  is 
not  uniform,  leading  in  turn,  to  lower  signal  to  noise  ratios  (SNR)  for  the  affected  region.  Many 
manufacturers,  hence,  have  re-designed  their  spectrometers  for  imaging  use.  Another 
manufacturer  has  avoided  this  issue  by  aligning  their  microscope  to  sample  only  the  unaffected 
part  of  the  beam.  Since  the  non-imaging  spectrometer  did  not  require  imaging  and  the 
interferometer  was  simply  coupled  to  a  microscope,  these  issues  were  slowly  addressed. 
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Sampling  Modes:  Transmission,  Transmission-reflection,  Reflection  and  Attenuated  Total 
Reflection 

A  vast  majority  of  studies  report  the  use  of  transmission  sampling.  Other  major  developments 
have  been  the  incorporation  of  reflective  slides,24’ 25, 26  the  integration  of  ATR  elements  for  both 
microscopy  and  large  sample  imaging,  integration  of  ATR  technology  with  various  sample 
forming  accessories,  grazing  angle  accessories  and  multi-sample  accessories.  Reflective  slides 
actually  result  in  reflection-absorption  that  allows  the  beam  to  sample  the  signal  twice,  though 
with  a  different  phase  and  lower  signal  due  to  half  the  objective  being  used  for  transmitting  light 
to  the  sample  and  the  other  half  being  used  to  acquire  light  from  it.  A  detailed  theoretical 
understanding  of  the  confounding  effects  has  not  been  published,  though  an  example  of  the 
possible  data  correction  algorithm  has  been  reported.  ATR  imaging  is  also  highly  prevalent  and 
available  as  attachments  to  conventional  imaging  microscopes,  using  the  sample  chamber  of  the 
spectrometer  and  using  it  as  a  solid  immersion  lens.  We  discuss  examples  of  ATR  imaging  next. 

ATR 

In  the  Attenuated  Total  Reflection  (ATR)  mode,  an  IR  transmitting  crystal  of  precise  geometry 
of  high  refractive  index  is  employed  as  a  solid  immersion  lens.  Light  is  totally  reflected  at  the 
sample-crystal  interface  and  an  evanescent  field  penetrates  into  the  sample  to  provide  the 
interaction  to  be  observed  using  the  traveling  wave.  Since  the  sample  interaction  is  largely 
determined  by  the  lens  and  not  by  the  sample,  precise  and  controlled  depth  of  interaction  is 
available.  The  sample,  however,  needs  to  be  in  good  contact  to  allow  efficient  coupling  with  the 
evanescent  wave.  ATR  imaging  allows  users  to  work  with  relatively  thick  sample  sections  that 
do  not  require  much  sample  preparation  expertise  or  time.  The  first  use  of  ATR  imaging  was 
reported  by  Digilab  in  analyzing  large  samples  that  were  not  sectioned,  as  for  transmission.  ATR 
imaging  microscopy  was  demonstrated  soon  after,  followed  by  other  novel  accessories.  There 
were  other  unpublished  attempts  that  one  of  the  authors  is  aware  of:  In  1999,  for  example, 
Snively  et  al.  (personal  communication,  unpublished)  demonstrated  imaging  data  from  an 
inverted  ZnSe  prism  acting  as  a  single  bounce  ATR.  Soon  after,  we  employed  a  Ge  crystal  but 
found  the  signal  to  noise  ratio  of  the  imaging  system  of  that  time  to  be  very  poor.  In  addition  to 
the  ease  of  sample  preparation,  another  major  advantage  of  ATR  imaging  lies  in  improving  the 
limited  spatial  resolution  of  transmission  microscopy.  The  authors  assessed  that  they  were  able 
to  achieve  a  spatial  resolution  of  1  pm  with  a  Ge  internal  reflection  element 

Both  micro  and  macro  sampling  has  been  extensively  utilized.  A  spatial  resolution  of  3-4  urn 
using  a  Ge  ATR  element  was  claimed  based  on  more  stringent  criteria  than  used  previously.  Ge, 
ZnSe  and  diamond30  crystals  have  been  the  materials  of  choice  for  most  applications.  In 
particular,  Kazarian  and  co-workers  have  extensively  employed  ATR-FTIR  imaging  for  various 
applications  including  drug  release;  polymer/drug  formulations  and  biological  systems.30'33  The 
same  group  has  provided  other  innovative  sampling  configurations  for  specific  experiments, 
including  a  compaction  cell  that  allows  compaction  of  a  tablet  directly  on  a  diamond  crystal  with 
a  subsequent  imaging.  ’4  The  changes  in  the  distribution  of  a  tablet  consisting  of  hydroxypropyl 
methylcellulose  (HPMC)  and  caffeine  upon  contact  with  water  were  studied.  In  this  manner, 
conventional  dissolution  measurements  were  combined  with  a  concurrent  assessment  of  the 
compacted  tablet  structure.  "  As  opposed  to  the  organic  solvent-polymer  dissolution  experiments 
reported  earlier,  this  configuration  allows  for  easy  handling  and  imaging  of  water-induced 
dissolution.  The  setup  can  also  provide  high  throughput  analysis  of  materials  under  controlled 
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environments.  Microdroplet  sample  deposition  system  was  combined  with  a  humidity  control 
device  to  image  about  100  samples  deposited  on  the  surface  of  an  ATR  crystal  simultaneously. 
The  approach  was  extended  to  165  samples  and  were  reported  to  study  parallel  dissolution  of 
formulations. 

Multi-sample  Accessories  and  Sampling 

While  imaging  the  structure  of  materials  has  been  the  primary  focus  of  FTIR  imaging,  a  number 
of  applications  utilize  the  imaging  of  multiple  samples.  The  first  examples  were  from  the  field  of 

TO 

catalyst  research.  Typically  2-12  samples  could  be  imaged  and  analyzed  under  the  same 
conditions.  High  throughput  validation  or  method  development  was  the  primary  goal  in  these 
studies.  Tissue  microarrays  (TMAs)  provide  the  same  function  in  biomedical  imaging.  TMAs 
consist  of  tens  to  hundreds  of  samples  arranged  on  a  grid  format.  This  allows  for  easy 
visualization  of  the  structure  and  classification  accuracy  across  many  patients  and  the  statistical 
measures  needed  for  rigorous  validation.  The  primary  utility  of  the  multisample  image  in  this 
case  is  to  provide  wide-ranging  sampling  and  convenient  archiving  or  data  storage,  not 
necessarily  to  provide  a  higher  throughput.14, 39  With  the  appropriate  geometry,  many  samples 
can  be  imaged  to  understand  their  dynamics  in  a  concerted  fashion.  To  accommodate  the 
samples,  the  field  of  view  is  often  expanded.  This  results  in  a  lower  spatial  resolution.  For 
imaging  multiple  samples,  though,  the  spatial  resolution  can  be  conserved  but  temporal 
resolution  is  restricted. 

BIOMEDICAL  APPLICATIONS 

Bone 

Bone  has  been  the  tissue  studied  most  by  FTIR  imaging.  Bone  composition  changes  with 
development,  environment,  genetics,  health  and  disease,  is  amenable  to  imaging  at  the  resolution 
length  scale  of  imaging  and  has  a  limited  chemical  composition  that  is  characterized  using  IR 
spectroscopy.40  For  almost  30  years  until  the  late  1980s,41  bone  structure  was  studied  using 
single  element  detectors  in  FTIR  spectrometers.  Typically,  ground  bone  was  analyzed  using  the 
conventional  KBr  pellet  method.  This  pellet  method  obviously  destroyed  local  structures, 
precluding  an  understanding  of  molecular  variations  due  to  disease.  Nevertheless,  it  was 
sensitive  to  chemical  composition  and  did  provide  useful  information.  With  microscopy  and  now 
with  FTIR  imaging,  sample  integrity  is  maintained  and  ability  to  acquire  spectral  information  at 
anatomically  discrete  sites  is  possible.  From  the  resulting  spectra,  several  important  pieces  of 
information  can  be  obtained.  For  example,  a)  relative  mixture  composition  of  hydroxyapatite  and 
collagen  by  calculating  the  ratio  of  the  integrated  Vi,  V3  phosphate  and  amide  1  (mineral:  matrix 
ratio),  b)  carbonate  substitution  by  calculating  the  ratio  of  carbonate/phosphate  ratio  from  the 
ratio  of  integrated  V2  carbonate  peak  (850-900  cm'1)  and  Vi,  V3  phosphate  contour  (900-1200  cm' 
'),  c)  crystallinity  of  the  mineral  phase  from  the  ratio  of  1030/1020  peak  intensity.42  These  assays 
illustrate  several  quantities  important  to  bone  research  and  disease  diagnoses  that  can  be  readily 
performed.  Though  a  complete  discussion  is  available  in  the  reference40,  42'44,  we  pick  three 
illustrative  examples  demonstrating  the  applicability  in  disease  and  in  research. 

IR  spectral  analysis  of  healthy  and  disease  bone  has  been  reviewed  by  Boskey  et  al.42  with 
particular  emphasis  on  changes  in  bones  composition,  physiochemical  status  of  mineral  and 
matrix  of  bones  during  osteoporosis  and  the  effect  of  therapeutics  on  these  parameters. 
Osteoporosis  or  porous  bone  is  a  bone  disease  characterized  by  low  bone  mass  and  structural 
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deterioration  of  bone  tissue.  This  leads  to  bone  fragility  and  an  increased  susceptibility  to 
fractures,  especially  at  the  hip,  spine  and  wrist.  FTIR  images  of  the  mineral  content  and 
crystallinity  in  trabecular  bone  of  normal  and  osteoporotic  samples  clearly  depicts  that  the 
trabeculae  in  diseased  tissue  are  thinner.  Moreover,  the  mineral/matrix  ratio  in  osteoporotic  bone 
is  significantly  reduced,  whereas  crystallinity  is  increased.  These  advances  demonstrate  the 
potential  and  applicability  of  the  technique  to  characterize  diseased  tissue.  Bone  mineral  changes 
between  a  healthy  mouse  model  and  Fabry  diseased  (lipid  storage  disease)  mouse  model  were 
also  analyzed  in  which  globotriaosylceramide  (Gb3)  accumulates  in  tissues.4.  No  significant 
differences  in  the  bone  mineral  properties  were  observed  between  Fabry  and  healthy  mice,  which 
might  reflect  the  similar  lack  of  major  bone  phenotype  in  human  patients  with  Fabry’s  disease 
and  may  also  be  related  to  the  developmental  age  of  these  animals.  The  study  provides  an 
example  of  the  applicability  to  laboratory  research. 

Calcified  tissue  in  biopsies  from  adults  with  osteomalica  has  been  studied.44  Osteomalacia  results 
in  a  deficiency  of  the  primary  mineralization  of  the  matrix,  leading  to  an  accumulation  of  osteoid 
tissue  and  reduction  in  bone’s  mechanical  strength.  A  decrease  in  trabecular  bone  content  with 
absence  of  changes  in  matrix  or  mineral  is  noticed  when  iliac  crest  biopsies  of  individuals  with 
vitamin  D  deficient  osteomalacia  are  compared  to  normal  controls.  These  findings  support  the 
assumption  that,  in  osteomalacia,  the  quality  of  the  organic  matrix  and  of  mineral  in  the  centre  of 
the  bone  does  not  vary,  where  as  less-than  optimal  mineralization  occurs  at  the  bone  surface. 

Brain 

12 

Monkey  brain  tissues  were  one  among  the  first  tissues  examined  by  using  FTIR  imaging. 
Lately,  the  applications  have  experienced  a  renaissance  with  applications  to  the  human  brain. 
Grossly,  brain  can  be  divided  into  two  types  of  matter,  namely  gray  matter  and  white  matter. 
These  names  derive  simply  from  their  appearance  to  the  naked  eye.  Gray  matter  consists  of  cell 
bodies  of  nerve  cells  while  white  matter  consists  of  the  long  filaments  that  extend  from  the  cell 
bodies  -  the  "telephone  wires"  of  the  neuronal  network,  transmitting  the  electrical  signals  that 
carry  the  messages  between  neurons.  A  visualization  of  the  two  compartments  fonned  the  first 
demonstrative  application  of  FTIR  microspectroscopic  imaging. 

FTIR  imaging  and  multivariate  statistical  analyses  (unsupervised  hierarchical  cluster  analysis) 
were  applied  alongwith  histology  and  immunohistochemistry  in  an  animal  model  having 
Glioblastoma  multiform  (GBM).45  GBM  is  a  highly  malignant  human  brain  tumor  that  is 
considered  to  be  the  one  of  the  most  difficult  to  treat  effectively.46  Authors  were  able  to  identify 
the  tumor  growth  as  chemically  distinct  from  the  surrounding  brain  tissue.  The  distribution  of  the 
absorbance  of  amide  I  in  images  highlighted  high  concentrations  of  proteins  in  the  corpus 
callosum  and  regions  of  basal  ganglia  for  healthy  brain.  Low  absorbance  was  generally  observed 
in  the  cortex,  whilst  a  higher  absorbance  was  observed  at  outer  layer  of  the  cortex.  For  a  GBM 
bearing  animal,  the  highest  absorbance  was  found  at  the  tumor  site.  In  contrast  to  healthy  brain,  a 
lower  absorbance  of  the  amide  I  band  was  observed  at  the  corpus  callosum  when  compared  to 
that  in  the  cortex  and  the  caudoputamen.  The  study  demonstrates  a  powerful  application  of 
simple  analyses  that  can  indicate  disease.  It  also  highlights  the  multitude  of  spatial  and  spectral 
clues  that  can  be  use  to  diagnose  or  understand  the  disease. 
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In  addition  to  primary  disease  sites,  diagnoses  metastatic  spread  from  various  cancers  was  also 
reported.47  A  multivariate  classification  algorithm  was  used  to  distinguish  normal  tissue  from 
brain  metastases  successfully  and  to  classify  the  primary  tumor  of  brain  metastases  from  renal 
cell  carcinoma,  lung  cancer,  colorectal  cancer,  and  breast  cancer.  In  the  cluster  averaged  IR 
spectra  from  a  brain  metastasis  of  renal  cell  carcinoma,  the  main  spectral  differences  were 
observed  for  the  three  tissue  regions  in  the  region  from  950  to  1200  cm'1  and  from  1500  to  1700 
cm'1.  Band  intensities  of  1026,  1080  and  1153  cm'1  are  at  maximum  in  the  spectrum  of  black 
cluster  and  minimum  in  the  spectrum  of  light  gray  cluster.  The  comparisons  of  the  IR  spectra  of 
nonnal  brain  tissue  and  brain  metastases  of  lung,  breast  cancer  and  colorectal  cancer  were  made 
and  found  that  these  spectra  do  not  contain  spectral  features  at  1026,  1080  and  1153  cm'1  that  are 
indicative  of  the  presence  of  glycogen.  It  was  concluded  that  these  aforementioned  spectral 
features  would  be  considered  as  a  biomarkers  for  brain  metastases  of  the  primary  tumor  renal 
cell  carcinoma.  In  addition  to  these  three  bands,  the  spectral  differences  were  observed  for  the 
bands  at  1542  and  1655  cm'1,  owing  to  the  presence  of  amide  I  and  amide  II  vibrations.  It  is  clear 
from  the  results  that  the  maximum  protein  concentrations  correlate  with  minimum  glycogen 
concentrations  in  the  IR  image.  However,  the  protein  and  glycogen  properties  evident  in  the  IR 
image  are  not  visible  in  the  unstained  cryosection.  It  is  noteworthy  that  simple  univariate 
analyses  provide  the  end  clues  to  the  disease.  Even  on  application  of  multivariate  techniques,  the 
most  prominent  and  easy  to  understand  biomarkers  of  disease  are  those  defined  by  conventional 
spectroscopic  knowledge  as  being  important  for  identification,  namely,  features  and  their 
absorption. 

In  the  cluster-averaged  IR  spectra  of  white  matter  from  the  three  normal  brain  tissue  samples, 
intense  bands  at  1060,  1233,  1466,  1735,  2850  and  2920  cm'1  due  to  the  high  lipid  concentration 
in  white  matter  were  noticed.  Intensity  changes  were  due  to  inter-sample  and  patient  to  patient 
variances  of  the  same  tissue  type.  In  addition,  cluster-averaged  IR  spectra  of  a  brain  metastasis  of 
(renal  cell  carcinoma,  breast  cancer,  lung  cancer,  and  colorectal  cancer)  and  gray  matter  of 
nonnal  brain  tissue  were  compared  after  baseline  subtraction  and  then  normalization  with  respect 
to  the  amide  I  band.  Significant  differences  in  the  band  positions,  intensities  and  area  were 
observed  between  these  samples  which  were  then  used  as  potential  candidates  to  differentiate 
nonnal  and  tumor  tissue  and  for  the  identification  of  the  primary  tumor.  Here,  authors  used  only 
eight  spectroscopic  features  for  LDA  model.  They  were  able  to  classify  correctly  for  three  out  of 
three  normal  brain  tissue  and  16  out  of  17  brain  metastases  samples.  Hence,  though  univariate 
analyses  and  features  provide  useful  recognition,  their  integration  into  a  multivariate  algorithm 
provides  for  automated  recognition  of  clinical  importance.  It  may  also  be  argued,  however,  that  it 
is  questionable  whether  the  small  numbers  of  samples  employed  represent  a  true  performance 
condition  for  the  algorithm  or  are  simply  reflective  of  bias  arising  from  the  clinical  setting  or 
sample  sources.  The  advent  of  faster  imaging  approaches  and  advanced  sampling  techniques  like 
TMAs  can  allow  for  larger  numbers  of  samples  to  be  analyzed  and  such  doubts  about  the  validity 
of  studies  be  put  to  rest. 

Similarly,  tissues  from  rat  Glioma  models  have  been  characterized  and  used  to  discriminate 
healthy  from  tumor  sections  using  principal  component  analysis  and  K-means.  Pseudo  color 
maps  reported  were  constructed  on  8-means  clusters,  where  each  cluster  is  consisting  of  similar 
spectra.  The  lipids/protein  ratio  (1466/1452  cm'1)  was  found  to  be  decreased  and  the  band  at 
1740  cm'1  became  weak  and  almost  vanished  as  compared  to  the  corresponding  bands  in  the 
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healthy  tissue.  In  addition  to  the  above  mentioned  differences,  significant  differences  between 
healthy  and  tumor  affected  tissue  were  observed  in  the  finger  print  region.  In  the  healthy  tissue,  a 
weak  band  at  1172  cm'1,  representing  the  stretching  mode  of  C-0  groups  were  observed. 
Reduced  intensity  as  well  as  shifting  of  peak  to  1190  cm'1  was  noted  for  tumor  and  surrounding 
tumor  spectra.  Tumor  tissue  was  observed  to  contain  a  decreased  intensity  of  the  asymmetric 
phosphate  stretching  and  C-C  stretching  and  an  increased  intensity  of  the  symmetric  phosphate 
stretching  when  compared  to  the  healthy  tissue.  Variations  in  lipid  features  (methylene  and 
methyl  stretching)  were  also  observed.  The  major  point  here  is  that  the  entire  spectrum  contains 
numerous  points  of  difference  between  healthy  and  diseased  tissue.  Results  were  found  to  be  in 
agreement  with  those  obtained  from  pathology.49  The  structural  difference  around  the  tumor  was 
noted,  which  could  be  ascribed  to  the  peritumoral  aedoma  observed  during  glioma  development. 
An  increase  in  the  penneability  of  the  blood-brain  barrier  and  aggravation  in  the  mass  effect  of 
tumors  are  the  rationale  for  aedoma,  which  is  associated  with  brain  tumor.  Fundamental 
understanding  can  be  enhanced  by  a  complete  understanding  of  the  spectral  differences  but 
prediction  algorithms  need  only  a  few  measures  of  the  spectral  data  to  be  effective. 


Breast 

Two  major  applications  in  breast  tissue  deal  with  complications  arising  from  artificial  alterations 
of  the  tissue  and  the  evolution  of  cancer.  While  breast  augmentation  by  implants  is  highly 
prevalent,  its  complications  have  been  discussed  more  recently.  On  the  other  hand,  the 
conventional  method  for  diagnosing  and  evaluating  the  prediction  of  breast  disease  is  a 
histopathological  examination  of  biopsy  samples,  a  practice  that  has  some  shortcomings.  For 
breast  implants,  a  major  question  is  the  containment  of  filling  material  as  its  leakage  can  lead  to 
potential  diseases.  The  silicone  gel  in  implants  is  very  different  chemically  from  surrounding 
tissue  and  its  presence  in  tissue  sections  indicates  a  definite  leak  from  the  implant  either  due  to 
material  failure  as  a  consequence  of  aging.  A  spectroscopic  image50  generated  from  the 
asymmetric  stretching  modes  of  the  methyl  groups  attached  to  silicon  in  the  gel  allowed  for  the 
examination  of  silicone  in  the  tissue.  Due  to  the  unique  chemical  contrast  employed  in  FTIR 
imaging,  such  presence  can  be  discerned  within  the  tissue,  even  when  optical  microscopy 
contrast  was  poor.  An  example  of  presence  of  Dacron  (a  commercial  name  for  poly( ethylene 
terepthalate))  fixative  patch  threads  in  the  breast  tissues  was  shown.50  It  was  noted  that  the 
technique  is  capable  of  rapid  analysis  within  minutes  of  sectioning  the  tissue. 

A  few  reports  have  also  applied  FTIR  imaging  for  diagnosing  breast  diseases.  Breast  tumor 
tissues  were  characterized  by  both  FTIR  Imaging  and  point  mapping  techniques  and  advantages 
over  the  other  were  evaluated.51  Similar  comparisons  had  previously  been  reported  for  polymeric 
materials,  analyzing  both  static  and  dynamic  samples.52  Comparison  images  from  the  two 
methods,  imaging  data  provided  a  clearer  structure  in  the  tumor  area  than  the  data  obtained  from 
point  mapping.  Since  breast  tumor  cells  are  ~10  pm  in  diameter,  point  mapping  data  (with  an 
aperture  of  30  pm)  would  always  contains  the  spectrum  of  tumor  cells  as  well  as  from  the 
contributions  of  other  components  surrounding  the  cells.  The  study  clearly  indicated  that  the 
conventional  point  mapping  approach  can  fail  to  detect  a  small  number  of  malignant  cells  due  to 
its  poor  resolution  capabilities.  Nevertheless,  the  contamination  problem,  i.e.,  the  spectral 
contributions  of  other  components  surrounding  the  cell  is  found  to  be  less  severe  in  case  of 
ductal  carcinoma  in  situ  (DCIS).  The  study  illustrates  the  need  for  matching  the  appropriate  level 
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of  spatial  resolution  to  the  task.  While  the  30  pm  resolution  may  be  appropriate  for  some 
applications,  it  was  clearly  insufficient  for  detecting  smaller  numbers  of  cells. 

Artificial  network  and  K-means  cluster  analysis  have  also  been  employed  for  the  classification  of 
FTIR  imaging  data  from  normal  and  malignant  immortalized  human  breast  cell  lines.  ~  Normal 
cells,  carcinoma  cells,  mixed  normal  and  carcinoma  cells  were  used.  Differences  in  the  spectral 
backgrounds  between  the  training  and  test  data  were  observed,  which  confounds  the 
reproducibility  of  recorded  spectra  and,  thus,  causes  the  classifier  to  fail.  Using  rejection 
thresholds  in  the  application  of  the  ANN  classifier  was  reported  to  be  helpful  in  identifying 
doubtful  classifications.  Another  study54  reported  imaging  fibroadenoma,  a  benign  breast  tumor. 
Data  were  evaluated  using  unsupervised  cluster  analysis  by  utilizing  two  spectral  regions, 
namely  1000-1500  and  2800-3000  cm'1.  The  distribution  of  four  main  tissue  components- 
epithelium,  retro  nuclear  basal  epithelial  regions,  mantle  zone  and  distant  connective  tissue  were 
visualized.  The  spectral  features  from  each  component  were  discussed  in  detail.  Furthermore, 
comparing  epithelia  from  fibroaedenoma  and  DCIS,  the  authors  determined  that  subtle 
distinctions  between  the  IR  characteristics  of  these  two  are  reproducible.  The  initial  study  used 
tissue  from  a  single  patient. 

The  work  was  recently  extended55  to  diagnose  benign  and  malignant  lesions  from  22  patients. 
The  study  utilized  only  spectra  from  well-defined  tumor  areas  owing  to  the  heterogeneity  of 
tissues.  Based  on  the  cluster  analysis  and  on  comparison  with  the  H  &  E  images,  four  classes  of 
distinct  breast  tissue  spectra  were  identified  -  fibroadenoma  (FA),  ductal  carcinoma  in  situ 
(DCIS),  connective  tissue  and  adipose  tissue.  Further,  ANNs  were  developed  as  an  automated 
classifier  to  differentiate  the  four  classes.  All  spectra  of  connective  tissue  and  adipose  tissue  were 
classified  correctly,  where  the  spectral  features  are  clearly  different  from  each  other  and  from 
tumors  as  well.  Differentiating  fibroadenoma  from  DCIS  was  more  difficult.  A  toplevel/sublevel 
strategy  was  further  applied  and  was  able  to  differentiate  93%  between  fibroadenoma  and  DCIS 
spectra  by  employing  principal  component  analysis.  From  the  mean  spectra,  it  was  found  that  the 
DCIS  has  more  lipid  content  than  the  fibroadenoma.  Invasive  ductal  carcinoma  (IDC)  could  not 
be  well  characterized  due  to  contamination  from  surrounding  cells,  illustrating  the  limited  spatial 
resolution. 

Cervical  Cancer 

The  cervix  is  the  lower  part  of  the  uterus  (womb)  in  which  two  major  types  of  cancers  occur: 
squamous  cell  carcinoma  and  adenocarcinoma.  About  80%  to  90%  of  cervical  cancers  are 
squamous  cell  carcinomas,  and  the  remaining  10%  to  20%  are  adenocarcinomas.  Less  commonly, 
cervical  cancers  have  features  of  both  squamous  cell  carcinomas  and  adenocarcinomas.  These 
are  called  adenosquamous  carcinomas  or  mixed  carcinomas.  Typically,  the  Papanicolaou  (Pap) 
test  checks  for  changes  in  the  exfoliated  cells  of  cervix  to  find  the  presence  of  any  infection, 
abnormal  (unhealthy)  cervical  cells,  or  cervical  cancer.  FTIR  spectroscopy,  micro  spectroscopy 
and  FTIR  imaging  have  been  widely  utilized  to  study  cervical  cancer  and  to  perform  the  same 
function  using  computer  analyses  of  spectra.26' 56-60  While  the  first  reports  in  diagnosing  cervical 
cancer  are  now  generally  not  regarded  as  leading  to  solutions,56  two  groups  have  provided 
definitive  proof  of  the  potential  of  IR  spectroscopy  by  careful  microscopy  studies.26, 57,  45, 59, 60 
While  FTIR  images  of  the  amide  I  and  oaSy  PO2"  bands  with  H&E  stained  image  were  compared 
and  only  a  rough  correlation  with  the  pathological  features  or  cell  types  were  obtained,  cluster 
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maps  of  two,  five  and  eight  clusters  resulting  from  UHC  analysis  for  the  whole  spectrum 
demonstrated  good  segmentation.  In  five  clusters,  most  cell  types  are  apparent  including 
superficial  (1),  intennediate  (2),  parabasal  (3),  and  connective  tissue  (5)  upon  correlation  with 
the  stained  image.  As  in  univariate  images,  the  connective  tissue  region  (5)  is  split  in  to  two 
clusters.  Furthermore,  by  comparing  between  the  UHC  analysis  of  the  whole  spectrum  and  only 
the  amide  I  region,  authors  demonstrated  that  minimizing  the  spectral  region  for  analysis  and 
using  fewer  clusters  does  not  lead  to  the  loss  of  useful  information.  Both  univariate  FTIR  and 
multivariate  images  of  the  sample  with  several  endocervical  ducts  within  the  connective  tissue 
were  shown.  These  endocervical  ducts  lined  with  columnar  endocervical  cells  were  apparent  in 
all  those  images,  in  particular  even  with  two  clusters. 

Cultures  derived  from  cervical  cancer  cells  (HeLa)  are  one  of  the  most  popular  model  systems 
and  have  been  studied  using  FTIR  imaging.61  The  cells  were  directly  grown  as  sparse 
monolayers  onto  low-e  slides.  FTIR  image  of  amide  I  band  region  was  shown;  where  large 
differences  in  spectral  intensities  associated  with  the  cells  were  observed  even  though  these  cells 
are  from  a  homogeneous  and  exponential  cell  culture.  Cluster  analyses  of  normalized  spectra 

AT 

shows  distinct  differences  that  were  not  appreciated  in  the  univariate  image.  Similarly,  IR 
imaging  with  fuzzy  C-  means  clustering  and  hierarchical  cluster  analysis  were  utilized  to  study 
the  thin  sections  of  cervix  uteri  encompassing  nonnal,  precancerous  and  squamous  cell 
carcinoma.  These  studies  demonstrate  that  IR  imaging,  in  combination  with  multivariate 
techniques,  is  capable  of  segmenting  cervical  tissues  in  a  manner  that  is  comparable  to  H&E 
stained  image  differentiation  and  is  significantly  more  sensitive  in  terms  of  the  chemical 
composition  of  the  cells  -  whether  it  be  due  to  metabolic  or  disease  reasons. 


Prostate 

Prostate  cancer  is  the  most  prevalent  internal  cancer  in  the  US.  Hence,  its  pathologic  diagnosis 
and  correct  interpretation  of  disease  state  is  crucial.64  FTIR  imaging  has  been  proposed  as 
solution  that  can  potentially  help  pathologists  by  providing  an  objective  and  reproducible 
assessment  of  disease  in  a  manner  that  is  easily  understood  by  clinicians.  It  is  also  a  good  model 
system  for  the  development  of  FTIR  imaging  protocols.  We  first  review  progress  in  the  field  and 
then  describe  efforts  in  our  and  collaborator’s  laboratories  towards  formulating  a  practical 
algorithm  for  prostate  cancer  pathology.  While  a  number  of  studies  examined  human  prostate 
tissue  with  IR  spectroscopy  '  microscopy  approaches  have  recently  been  extensively  utilized 
to  study  both  fundamental  properties  of  prostate  tissue  and  to  determine  structural  units  in 
nonnal  and  disease  states.69'75  An  understanding  of  the  tissue  is  now  emerging  as  a  result  of  these 
studies.  While  the  fundamental  properties  of  the  tissue  are  being  examined,  we  have  focused  on 
developed  statistically  validated  diagnostic  methods. 

We  have  utilized  high  throughout  imaging  with  the  express  purpose  of  correlating  spectra  to 
clinical  practice.39, 64, 76  It  is  instructive  to  first  examine  the  approaches  of  some  previous  studies 
and  then  describe  our  approach  in  some  detail.  A  variety  of  techniques  have  been  reported  for 
analyzing  prostate  tissue,  including  unsupervised  multivariate  data  analysis  techniques  such  as 
agglomerative  hierarchical  clustering  (AH),  fuzzy  C-means  (FCM),  or  k-means  (KM)  clustering 
to  construct  infrared  spectral  maps  of  tissue  structures.77  The  results  from  these  multivariate 
techniques  confirmed  the  standard  histopathological  techniques  and  found  out  to  be  helpful  for 
identifying  and  discriminating  the  tissues  structures.  Agglomerative  hierarchical  clustering  was 


10 


found  to  be  the  best  method  among  the  cluster  imaging  methods  in  terms  of  segmenting  the 
tissue.  While  these  techniques  comprise  one  end  of  the  approach  in  using  large  spectral  regions 
and  completely  objective  methods,  the  other  extreme  has  also  proven  to  be  useful.  In  the  second 
paradigm,  careful  examination  of  the  spectral  data  yields  some  measures  that  prove  useful.  For 
example,  the  ratio  of  peak  areas  at  1030  and  1080  cm'1,  corresponding  to  the  glycogen  and 
phosphate  vibrations  respectively  were  utilized  as  a  diagnostic  marker  for  the  differentiation  of 
benign  from  malignant  cells.69  Authors  summarized  that  the  use  of  this  ratio  in  association  with 
FTIR  spectral  imaging  provides  a  basis  for  estimating  areas  of  malignant  tissue  within  defined 
regions  of  a  specimen.  While  it  may  be  argued  that  the  former  is  not  based  on  clinical  knowledge 
and  is  more  suited  for  discovery,  it  also  involves  the  choice  of  selecting  specific  number  of 
clusters  and  their  subsequent  interpretation.  The  latter  is  based  on  a  single  parameter  whose 
utility  for  universal  diagnoses  remains  to  be  tested.  Nevertheless,  these  studies  indicate  that  both 
approaches  provide  infonnation  about  the  tissue  that  is  useful. 

Our  approach  has  used  elements  from  both  pattern  recognition  and  spectroscopic  analyses  of 
univariate  measures.  39,76  In  all  cases,  one  starts  with  the  acquired  imaging  data  (figure  4).  Since 
the  data  set  is  large  (typically  10-1000  GB),  it  is  advisable  to  reduce  the  dimensionality  of  data 
using  some  numerical  procedure.  Compression  algorithms,  principal  components  analyses  or 
simply  storing  only  the  information  needed  for  classification  (if  the  algorithm  is  known)  is  useful. 
We  sought  expressly  to  relate  the  recorded  IR  imaging  data  to  clinical  knowledge  base.  Hence 
we  started  with  a  model  that  is  derived  from  clinical  practice.  Clearly,  the  approach  limits  the 
discovery  of  new  knowledge  but  it  assures  the  clinician  that  all  quantities  of  importance  for 
diagnoses  will  be  considered.  The  acquired  data  is  labeled  with  known  cell  identity  or  disease 
states.  These  pixels  are  best  identified  by  a  combination  of  very  careful  manual  labeling  and  test 
for  absorbance  fidelity.  Spectra  from  the  label  regions  are  employed  via  average  values, 
medians  and  standard  deviation  analyses  to  determine  a  set  of  spectral  features  that  are 
descriptive  of  the  major  features  of  all  spectra.  We  first  note  that  the  characteristic  IR  absorbance 
spectra  of  ten  histological  classes  comprising  prostate  tissue  look  similar.  Though  small 
differences  in  spectral  features  were  observed  at  many  frequencies,  summary  statistics  are 
limited  in  their  examination  of  spectra  for  classification.  Further,  the  small  differences  indicate 
that  noise  and  biological  variability  may  render  univariate  measures  less  reliable.  The  large 
number  of  classes  usually  implies  that  univariate  analyses  cannot  distinguish  all  histological 
classes  present  in  the  tissues  and  hence  the  need  for  multivariate  analyses  is  apparent.  Here  the 
similarity  of  the  spectral  features  for  all  classes  works  in  our  favor.  Very  similar  baseline  points 
are  obtained  from  an  analysis  of  all  spectra  and  only  subtle  feature  differences  are  noted  to 
distinguish  the  various  class  spectra.  Hence,  unknown  spectra  can  be  processed  in  the  same 
specified  manner,  without  introducing  any  bias.  Each  of  these  features  is  termed  a  metric  to 
denote  that  it  is  a  useful  measure  of  the  spectrum.  Individual  metrics  can  allow  segmentation  of 
various  tissue  types  if  they  are  sufficiently  different  in  a  sampled  population. 

We  then  employ  the  equivalent  of  a  t-test  in  that  the  overlap  between  the  absorbance 
distributions  of  metrics  is  determined  and  equated  to  the  error  in  prediction.  The  metrics  are 
arranged  in  the  order  of  increasing  overlap.  Hence,  we  have  an  ordered  set  that  differentiates  at 
least  two  classes.  To  obtain  overall  accuracy,  we  employ  a  modified  Bayesian  algorithm  to 
provide  the  probability  of  each  class  for  every  pixel.  This  fuzzy  result  is  employed  to  determine 
the  area  under  the  curve  (AUC)  of  a  receiver  operating  characteristic  (ROC)  curve.  The  ROC 
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curve  is  built  from  accepting  the  probability  of  each  class  at  an  increasing  threshold  that  varies 
between  0  and  1.  For  optimized  threshold  values,  the  fuzzy  classification  is  turned  into  a 
classified  image,  where  each  pixel  is  assigned  a  distinct  class.  We  note  that  the  method 
incorporates  analysis  of  all  spectral  features,  a  selection  of  the  best  features  based  on  statistical 
analysis  of  data  and  an  optimal  prediction  of  the  class  of  each  pixel  based  on  an  objective 
selection  rule  from  the  fuzzy  classification.  The  method  is  very  powerful  in  that  it  employs 
spectral  features  that  are  ordinarily  employed  by  spectroscopists  as  metrics,  which  pennits  a 
spectroscopic  analysis  of  the  basis  of  decision-making.  Further,  the  method  explicitly  obtains  the 
fuzzy  rule  data  for  final  classification.  The  value  of  the  rule  data  for  each  class  is  actually  the 
probability  of  belonging  to  the  class  without  consideration  for  the  prior  prevalence  of  the  class. 
Hence,  the  method  can  allow  direct  comparisons  between  performances  for  different  classes.  The 
dependence  of  the  process  on  various  experimental  parameters  has  also  been  reported. 

The  complication  inherent  in  translating  the  results  from  small  data  set  of  patients  to  clinical 
applications  is  well  recognized  in  the  spectroscopy  community.  The  variability  in  data,  arising 
from  variations  within  and  between  patients,  sample  preparation  and  handling,  is  likely  to 
provide  noisy  estimates  of  performance.  Hence,  statistical  stability  may  be  obtained  by 
examining  a  large  number  of  samples.  Similarly,  large  number  of  patients  may  be  employed  to 
provide  calibration  models,  likely  improving  the  robustness  of  the  developed  algorithm.  We  have 
described  a  high  throughput  sampling  method  from  tissues.14, 39,  76  Briefly,  the  approach  uses  a 
combinatorial  sampling  of  tissue  type  and  pathology  to  first  acquire  small  sections  of  tissues 
from  large  archival  cases.  These  small  sections  are  arranged  in  a  grid  pattern  and  placed  on  the 
same  substrate.  The  sample  is  termed  a  tissue  microarray  to  reflect  the  similarity  with  cDNA 
microarrays.  For  spectroscopic  imaging  and  the  development  of  automated  algorithms,  the 
approach  represents  a  large  number  of  cases  that  can  be  used  both  for  accurate  prediction 
algorithm  building  and  for  extensive  validations.  The  same  approach  is  likely  to  prove  useful  for 
extensions  to  determining  pathology.  Figure  5  demonstrates  the  typical  workflow  of  a  validation 
algorithm  and  methods  used  for  statistical  comparison.  We  strongly  suggest  a  variety  of  methods 
for  measuring  perfonnance  as  each  method  has  its  own  advantages  and  disadvantages.  For 
example,  summary  measures  from  ROC  curves  only  provide  information  about  accuracy  but  do 
not  provide  which  class  the  inaccuracies  arise  from.  Similarly,  confusion  matrices  provide  cross¬ 
class  information  but  do  not  provide  global  performance  measures  in  the  mold  of  ROC  curves. 

OUTLOOK 

FTIR  imaging  has  experienced  rapid  growth  in  the  past  10  years  and  is  increasingly  being 
applied  to  biomedical  tissue,  especially  for  the  analyses  of  cancer.  The  major  trends  emerging  in 
instrumentation  include  faster  detectors  and  novel  modes  of  data  collection  (e.g.  time  -resolved 
imaging),  of  sampling  (e.g.  ATR)  and  application  areas.  For  biomedical  samples,  the  information 
content  is  quite  rich  and  is  often  available  through  simple  univariate  analysis.  For  more  complex 
applications,  e.g.  cancer  diagnoses,  the  data  acquisition,  sampling  and  data  analyses  must  be 
integrated  in  a  coherent  manner  to  provide  a  practical  solution.  We  anticipate  that  the  technology 
and  its  application  to  biomedical  problem  will  continue  to  grow  with  the  cooperation  of 
instrument  manufacturers,  applications  scientists,  numerical  methods  developers  and 
communities  that  can  utilize  the  information  effectively,  e.g.  pathologists  or  surgeons. 
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Figure  1 .  Various  MCT  FPA  detectors  employed  for  FTIR  imaging  since  the  first  reports  using  Santa 
Barbara  Focalplane  (SBFP)  array  detectors.  The  years  in  parentheses  are  the  first  reports  of  use  for 
FTIR  imaging.  Perkin-Elmer  introdcued  the  concept  of  utilizing  a  small  linear  array  for  very  high  signal  to 
noise  ratios,  an  approach  that  has  since  been  adopted  by  Thermo.  Our  research  efforts  have  involved 
the  use  of  a  high  end,  custom-built  detector  that  allows  for  fast  imaging. 
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Figure  4.  Organization  of  data  into  a  prediction  algorithm  involves  several  steps.  Acquired  FTIR  imaging  data  (top,  left)  is  reduced  by 
manual  selection  to  a  set  of  features  that  capture  the  essential  elements  of  spectra  from  all  tissue  types.  A  model  (top,  right)  is  selected 
for  the  data  and  employed  to  develop  an  algorithm.  The  algorithm  is  applied  to  the  entire  metric  set  and  prediction  capabilities  are 
optimized.  Results  of  the  optimization  provide  an  optimal  metric  set  for  validation  studies,  the  parameters  of  the  algorithm  to  be  applied 
and  calibration  classification  statistics.  The  optimized  algorithm  is  applied  to  acquired  data  without  supervision  (figure  5). 
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confidence  in  the  development  of  robust  algorithms. 
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ABSTRACT 

Fourier  transfomi  infrared  (FT-IR)  spectroscopic  imaging  is  an  emerging  technique  that  combines  the  molecular 
selectivity  of  spectroscopy  with  the  spatial  specificity  of  optical  microscopy.  We  demonstrate  a  new  concept  in  obtaining 
high  fidelity  data  using  commercial  array  detectors  coupled  to  a  microscope  and  Michelson  interferometer.  Next,  we 
apply  the  developed  technique  to  rapidly  provide  automated  histopathologic  information  for  breast  cancer.  Traditionally, 
disease  diagnoses  are  based  on  optical  examinations  of  stained  tissue  and  involve  a  skilled  recognition  of  morphological 
patterns  of  specific  cell  types  (histopathology).  Consequently,  histopathologic  determinations  are  a  time  consuming, 
subjective  process  with  innate  intra-  and  inter-operator  variability.  Utilizing  endogenous  molecular  contrast  inherent  in 
vibrational  spectra,  specially  designed  tissue  microarrays  and  pattern  recognition  of  specific  biochemical  features,  we 
report  an  integrated  algorithm  for  automated  classifications.  The  developed  protocol  is  objective,  statistically  significant 
and,  being  compatible  with  current  tissue  processing  procedures,  holds  potential  for  routine  clinical  diagnoses.  We  first 
demonstrate  that  the  classification  of  tissue  type  (histology)  can  be  accomplished  in  a  manner  that  is  robust  and  rigorous. 
Since  data  quality  and  classifier  performance  are  linked,  we  quantify  the  relationship  through  our  analysis  model.  Last, 
we  demonstrate  the  application  of  the  minimum  noise  fraction  (MNF)  transform  to  improve  tissue  segmentation. 

Keywords:  Breast  Cancer,  FT-IR  Spectroscopy,  Flyperspectral,  Histopathology,  Imaging,  Diagnostics,  MNF  Transform 


1.  INTRODUCTION 

As  histologic  analysis  of  biopsied  tissue  forms  the  standard  in  definitive  diagnosis  of  breast  lesions,  it  is  estimated  that 
more  than  1.6  million  women  undergo  breast  biopsies  each  year  in  the  US  alone.  Biopsy  samples  are  fixed  to  ensure 
tissue  stability1  and  then  sectioned  for  staining.2  Microscopic  examinations  of  stained  tissue  sections  by  a  trained 
pathologist  are  the  gold  standard  used  in  diagnosing  breast  cancer.1  Unfortunately,  these  evaluations  are  time  consuming4 
and  do  not  always  lead  to  an  unequivocal  diagnosis.  For  example,  a  study  of  481  breast  cancer  patients  from  1982-2000 
at  a  regional  cancer  center  indicated  that  73%  of  ductal  carcinoma  in  situ  (DCIS)  patients  are  referred  by  a  general 
pathologist  to  an  expert  pathologist  for  review.5  After  review,  43%  of  these  cases  received  different  treatment 
recommendations.  Another  study  found  that  52%  of  cases  referred  to  a  multidisciplinary  tumor  review  board  received 
different  surgery  recommendations.6  Clearly,  the  diagnostic  process  is  sub-optimal.  Rapid,  objective  second  opinions  are 
desirable.  The  use  of  emerging  biological  understanding  and  technologies  for  diagnoses  could  provide  additional 
information  in  tumor  evaluation  and  help  make  accurate  therapy  decisions.  Further,  it  is  likely  that  the  morphologic 
parameters  of  current  diagnoses  are  insufficient  and  additional  information  must  be  added.  This  information  is  typically 
biochemical  in  nature.  For  example,  staining  for  human  epidermal  growth  factor  receptor  2  (HER2)  can  identify  25-30% 
of  breast  cancers.7  Such  examples  of  success,  unfortunately,  are  uncommon  for  cancers  in  complex  tissues.  Hence, 
alternative  methods  are  urgently  required  to  aid  diagnostic  pathology. 

One  such  means  is  the  use  of  molecular  spectroscopy.  For  example,  Fourier  transform  infrared  (FT-IR)  spectroscopy  is 
traditionally  used  for  molecular  identifications  and  biomolecular  structure  elucidations,  but  is  not  currently  applied  in 
clinical  pathology.8  An  IR  spectrum  provides  a  unique  molecular  fingerprint  with  a  quantitative  measure  of  the 
molecular  bonds  present  in  an  examined  material.1’  Thus  it  should  give  a  reproducible  measurement  of  tissue 
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composition.  Tissue,  however,  is  microscopically  heterogeneous  and  the  measurement  of  chemical  composition  must  be 
made  in  the  context  of  knowledge  of  tissue  structure  (histology).10  The  recent  emergence  of  FT-IR  imaging  couples 
spectroscopy  and  microscopy  to  permit  rapid  acquisition  of  spectra  from  tens  of  thousands  of  pixels  at  a  high  spatial 
resolution.  Each  pixel  (spectrum)  typically  contains  thousands  of  data  points  in  the  mid-IR  wavelength  region  (2- 
12pm).11  Automated  classification  can  then  be  employed  for  rapid  computerized  tissue  image  analysis,  as  has  been 
practiced  in  both  the  spectral  processing  and  image  processing  communities.  The  end  goal  of  the  measurement  and 
associated  data  processing  steps  is  to  permit  the  rapid  segmentation  of  different  types  of  tissue  without  the  need  for 
chemical  dyes  or  contrast  agents.10  Last,  the  use  of  FT-IR  imaging  only  involves  light  interacting  with  a  sample  and, 
unlike  conventional  biochemical  analysis  methods,  does  not  alter  the  tissue  in  any  manner.  Thus  it  can  provide  additional 
information  for  pathology  without  the  necessity  of  additional  materials,  tissue  samples  or  changes  in  clinical  protocols. 

In  this  manuscript  we  use  breast  tissue  as  an  example  to  illustrate  the  application  of  FT-IR  imaging  coupled  with 
computerized  classification  for  histopathology.  Specifically,  we  demonstrate  that  a  combination  of  FT-IR  imaging, 
classification  algorithms  and  integrated  computational  methods  for  enhancement  of  acquired  data  can  be  used  in  tandem 
to  optimize  the  development  of  practical  protocols  for  automated  histopathology.  Previous  studies  report  on  the  potential 
for  IR  spectroscopy  in  breast  pathology,12’13’14’15’16'17  but  no  complete  study  on  the  spectral  features  of  different  histologic 
types  of  breast  tissue  exists.  Preliminary  efforts  indicate  significant  spectral  variation  between  different  types  of  breast 
tissue  and  breast  tumors,18’19'20  but  a  protocol  for  clinical  translation  is  lacking.  We  combine  fast  FT-IR  imaging  and 
tissue  microarray  sampling  to  demonstrate  the  effectiveness  of  our  approach  for  automated  breast  histopathology  on 
normal  and  malignant  tissue  from  five  patients.  This  approach  is  distinct  from  that  in  Raman  spectroscopy,  where 
histologic  models  are  used  in  analyzing  spectra.21'22  As  a  first  step  towards  automated  tissue  segmentation,  we 
distinguish  breast  stroma  and  epithelium.  This  is  a  critical  step,  as  over  99%  of  breast  tumors  arise  in  the  epithelial  tissue 
lining  milk  ducts  and  lobules.23  False  color  classified  images  denoting  stroma  and  epithelium  are  produced,  followed  by 
analysis  of  data  collection  parameters.  We  evaluate  the  impact  of  spectral  resolution  and  noise  on  classification  accuracy 
to  demonstrate  potential  for  faster  data  acquisition  without  loss  in  classification  confidence.  This  study  presents  an  initial 
effort  in  developing  applications  for  FT-IR  imaging  in  clinical  pathology. 


2.  METHODOLOGY 


2.1  Data  Acquisition 

The  first  studies  to  examine  IR  spectra  of  tissue  began  over  fifty  years  ago,24  but  the  field  did  not  truly  make  progress 
due  to  limitations  in  instrumentation.  Today,  a  combination  of  an  IR  microscope,  Michelson  interferometer  and  focal 
plane  array  (FPA)  detector25  permits  efficient  data  acquisition  for  large  sample  areas.  The  data  presented  in  this  study  is 
collected  using  the  Perkin-Elmer  Spotlight  400  imaging  spectrometer.  A  spatial  pixel  size  of  6.25  pm  and  a  spectral 
resolution  of  4  cm"1  were  employed,  with  2  scans  averaged  for  each  pixel.  An  IR  background  is  collected  with  120  scans 
co-added  at  a  location  on  the  substrate  where  no  tissue  is  present.  No  undersampling  was  employing  in  data  acquisition 
and  a  NB  medium  apodization  function  was  used.  A  ratio  of  the  background  to  tissue  spectra  is  then  computed  to  remove 
substrate  and  air  contributions  to  the  spectral  data.  The  Spotlight  software  atmospheric  correction  algorithm  is  applied  to 
eliminate  remaining  atmospheric  contributions  to  the  tissue  spectra.  As  opposed  to  other  configurations  that  employ  a 
large  FPA  detector,  this  instrument  employs  a  linear  array  detector  that  is  raster  scanned  to  acquire  data  from  large 
sample  areas.  We  use  a  combination  of  instrument  control  and  post-processing  software  to  computationally  re-organize 
data  acquired  into  large  image  sizes.  Images  of  stained  tissue  are  acquired  using  a  standard  Zeiss  optical  microscope. 

2.2  Tissue  sampling 

Tissue  microarrays  (TMAs)  permit  facile  comparison  of  small  tissue  samples  from  numerous  patients26  and  are  an 
especially  useful  sampling  medium  for  spectroscopic  analyses.27  A  TMA  contains  numerous  small  round  tissue  samples, 
termed  cores,  which  are  extracted  from  biopsy  samples  from  different  patients.  Two  paraffin-embedded  TMAs  were 
obtained  from  a  commercial  source  (US  Biomax)  for  this  study.  The  first  TMA  section  is  placed  on  a  glass  slide  and 
stained  with  hematoxylin  and  eosin  (Fl&E)  dyes.  In  Fl&E  staining,  hematoxylin  stains  nucleic  acids  and  eosin  stains 
protein-rich  tissue  regions.  This  section  is  used  for  visual  morphology  interpretation  by  a  pathologist.  The  second  TMA 
section  is  placed  on  a  barium  fluoride  (BaF2)  substrate  for  FT-IR  imaging.  Though  the  arrays  contained  a  large  number 
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of  samples,  a  smaller  subset  of  malignant  and  normal  tissue  cores  from  five  patients  with  invasive  ductal  carcinoma 
(IDC)  is  selected  for  this  study  as  the  illustrative  example.  Each  of  the  ten  cores  is  1.5  mm  in  diameter;  hence,  at  a  6.25 
pm  pixel  size,  approximately  280,000  spectra  are  collected  for  each  core.  This  results  in  the  collection  of  over  560,000 
spectra  for  each  patient  and  approximately  2.8  million  total  spectra  for  all  ten  cores.  This  large  spectral  dataset  facilitates 
rigorous  validation  of  classification  protocols  at  a  pixel  level.  Paraffin  is  removed  from  the  TMA  by  immersion  in 
hexane  with  continuous  stirring  at  40  °C  for  48-72  hours.  Spectra  are  recorded  at  several  locations  on  the  TMA  every  24 
hours  during  this  period  to  monitor  paraffin  removal  with  the  disappearance  of  the  1462  cm'1  peak. 


2.3  Image  analysis  and  classification 


A  supervised  segmentation  method  is  used  for  FT-IR  image  classification.  This  algorithm  has  been  described  in  detail 
elsewhere,28  but  is  based  on  a  modified  version  of  a  Bayesian  classifier.  First,  the  spectral  profile  of  1641  bands  is 
reduced  to  a  set  of  89  useful  metrics  by  examination  of  spectra  from  manually  selected  stroma  and  epithelium  tissue 
regions.  Metrics  are  manually  selected  to  include  peak  ratios,  peak  areas,  and  peak  centers  of  gravity.  A  metric  profile  M 
is  generated  for  each  pixel  in  each  tissue  image  of  the  form 

M  =  [ml,m2,m3,...mnJ,nm= 89  (1) 


where  each  m,  is  the  value  for  a  single  metric  and  nm  is  the  total  number  of  manually  selected  metrics.  Frequency 
distributions  for  stroma  and  epithelium  are  determined  for  each  metric  and  used  to  estimate  the  probability  of  a  given 
metric  profile  representing  either  of  these  two  classes.  The  probability  of  an  image  pixel  from  each  class  c,  being 
represented  by  a  given  metric  profile  is  determined  using  Bayes’  Rule 


p(Ci\M) 


P(M\cj)p(Ci) 

p{M) 


(2) 


where  /?(M|c,  )  is  estimated  from  the  metric  class  frequency  distributions  and  p(  M)  is  the  probability  of  a  given  metric 


profile.  The  prior  probability  of  particular  tissue  class  p(ct)  in  this  model  cannot  be  determined  due  the  manual 
selection  of  tissue  classes  on  FT-IR  images,  and  is  estimated  as  0.5.  Other  ways  to  estimate  or  optimize  the  class  prior 
probability  may  be  utilized;  we  have  noticed  anecdotally,  however,  that  the  choice  of  this  value  across  a  large  range  does 
not  significantly  affect  the  classification  results.  Classification  accuracy  is  estimated  with  receiver  operating 
characteristic  (ROC)  analysis  for  selected  tissue  regions.  The  area  under  the  ROC  curve  (AUC)  is  used  to  evaluate 
classifier  sensitivity  and  specificity  and  estimate  the  potential  of  the  algorithm  for  accurate  histology  determinations.  The 
classification  algorithm  is  trained  on  a  large  array  dataset  and  separately  validated  on  a  second  array.  It  is  notable  that  we 
do  not  develop  the  entire  classification  algorithm  anew  here.  First,  the  central  idea  of  this  manuscript  is  to  demonstrate 
the  optimization  of  a  developed  protocol  and  second,  the  sample  sizes  chosen  here  are  insufficient  for  de  novo  algorithm 
development.  Data  is  analyzed  using  the  Environment  for  Visualizing  Images  (ENVI)  software  and  with  programs 
written  in-house  using  Interactive  Data  Language  (IDL). 


2.4  Spectral  resolution  and  noise  analysis 

Spectral  resolution  and  noise  are  two  common  experimental  variables  that  affect  results  in  IR  spectral  analyses.  The 
effects  of  spectral  resolution  and  spectral  noise  are  evaluated  here  in  the  context  of  quantitative  histologic  segmentation 
to  minimize  data  collection  time.  As  per  the  trading  rules  of  IR  spectroscopy,  data  collection  time  is  expected  to  decrease 
linearly  with  spectral  resolution  and  a  quadratic  rate  with  reduction  in  signal-to-noise  ratio  (SNR)."9  Ideally,  these 
parameters  would  be  analyzed  by  acquiring  data  at  different  spectral  resolutions  and  numbers  of  spectral  co-adds. 
However,  the  time  required  to  collect  multiple  images  for  the  TMA  is  prohibitive.  Instead,  computational  methods  are 
used  to  examine  these  parameters  using  the  original  FT-IR  images  acquired  at  4  cm'1  and  2  scans  per  pixel.  First,  spectral 
resolution  is  evaluated  by  downsampling  the  data  using  a  neighbor  binning  procedure  to  resolutions  of  8,  16,  32,  64  and 
128  cm"1.  Classification  is  then  performed  on  downsampled  datasets  to  determine  the  coarsest  spectral  resolution  needed 
for  satisfactory  stroma  and  epithelium  segmentation.  For  a  fine  spectral  resolution  data  set  at  4  cm"1,  the  effect  of  noise  is 
evaluated  by  adding  to  each  spectrum  noise  in  Gaussian  distributions  with  standard  deviations  of  0.001,  0.01,  and  0.1  au. 
Classification  accuracy  is  estimated  by  evaluating  the  AUC  at  each  noise  standard  deviation.  Computational  noise 
reduction  with  the  minimum  noise  fraction  (MNF)  transform30  is  evaluated  by  reducing  noise  in  all  the  data  sets. 
Classification  is  performed  with  the  same  algorithm  on  these  MNF  transformed  images  to  determine  the  impact  of  this 
noise  reduction  algorithm  on  stroma  and  epithelium  segmentation. 
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3.  DATA 


The  classification  model  presented  in  this  manuscript  involves  segmentation  of  stroma  and  epithelium,  which  are  the  two 
most  prominent  tissue  classes  in  fixed  breast  tissue  used  for  pathology  evaluation.31  In  practice,  the  recognition  of 
epithelial  cells  is  especially  critical  for  cancer  diagnoses,  as  the  vast  majority  (>99%)  of  breast  cancers  arise  in  this  cell 
type.23  Hence,  the  two  class  model  is  of  practical  significance.  While  seemingly  simple  and  practical,  however,  the 
model  can  potentially  be  confounding  as  stroma  consists  of  many  cell  types  with  disparate  spectral  characteristics.  This 
model  was  employed  to  develop  a  classifier  using  training  data  from  a  TMA  with  forty  patients.  Final  model  calibration 
for  sixty  eight  tissue  cores  yielded  an  AUC  value  of  0.99  with  an  eight  metric  classifier.32’33  In  this  study  we  validate  this 
classifier  with  one  malignant  and  a  matched  normal  TMA  core  from  a  subset  of  five  patients.  As  seen  in  Figure  1A  and 
B,  absorbance  images  based  on  spectral  features  closely  compare  with  images  of  H&E  stained  tissue.  Hence,  using 
conventional  pathology  knowledge  we  can  select  image  pixels  that  unequivocally  correspond  between  the  two  images  - 
representing  both  stroma  and  epithelium.  These  pixels  are  selected  by  examining  FT-IR  images  at  1080  cm"1  to  highlight 
asymmetric  P02  stretching  vibrations  in  glycoprotein  in  epithelium,14  1236  cm"1  to  highlight  CH2  wagging  vibrations 
associated  with  collagen  proteins,34  1652  cm"1  to  highlight  C=0  stretching  vibrations  at  the  protein  amide  I  mode,34  and 
3292  cm'1  to  highlight  NH  bending  vibrations  at  the  protein  amide  A  mode  (shown  as  an  example  in  Figure  IB).35  We 
emphasize  that  multiple  vibrational  modes  must  be  examined  in  tandem  and  pixels  identified  with  great  care  and 
diligence  as  these  form  the  gold  standard  for  future  comparisons.  Over  185,000  pixels  are  marked  in  these  ten  tissue 
cores  to  serve  as  the  gold  standard  for  ROC  analysis  (as  shown  in  Figure  1C).  Selecting  this  large  set  of  pixels  is 
important  to  achieve  a  reasonable  sample  size  to  accurately  estimate  classification  potential  for  the  entire  data  set. 
Boundary  pixels  are  not  marked  to  avoid  errors  associated  with  mixed  pixels  in  FT-IR  images.27  A  qualitative 
comparison  of  stained  and  classified  images  indicates  that  stroma  and  epithelium  segmentation  is  reasonable  (Figure 
ID),  and  this  is  confinned  with  an  AUC  value  of  0.98  after  quantitative  ROC  analysis.  Stroma  and  epithelium  are  easily 
identified  on  false  color  classified  images  without  detailed  examination  and  interpretation.  This  is  advantageous  over 
traditional  staining  methods  that  require  the  use  of  chemical  dyes  and  subsequent  expert  pathologist  examination  for 
evaluation. 


Fig.  1.  Conventional  H&E  stained  images,  FT-IR  spectral  images  and  classification.  (A)  An  H&E  stained  image  of  tissue 
cores  from  five  invasive  ductal  carcinoma  patients.  Each  row  represents  a  single  patient,  with  malignant  tissue  samples 
on  the  left  and  normal  samples  on  the  right.  (B)  An  FT-IR  image  at  3292  cm"1  denotes  the  NH  bending  vibration  at  the 
amide  A  protein  mode.  Brighter  regions  denote  relatively  protein-rich  stroma.  (C)  A  ground  truth  FITR  image  with 
pixels  marked  as  stroma  or  epithelium  serves  as  the  gold  standard  for  ROC  analysis  and  classification  evaluation.  (D)  A 
classified  FT-IR  image  in  which  all  pixels  are  labeled  as  stroma  or  epithelium  accurately  corresponds  to  the  H&E 
stained  image.  The  classification  does  not  require  stains  or  human  interpretation. 
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4.  RESULTS 


4.1  Effect  of  spectral  resolution  on  tissue  segmentation 

The  impact  of  spectral  resolution  on  classification  performance  is  evaluated  by  downsampling  spectra  at  every  pixel  with 
a  neighbor  binning  and  interpolation  procedure.  FT-IR  image  data  sets  are  acquired  at  4  cm"1  spectral  resolution  and  are 
downsampled  to  8,  16,  32,  64,  and  128  cm'1  resolution.  As  seen  in  Figure  2A,  an  average  spectrum  at  each  resolution 
from  epithelial  cells  in  the  gold  standard  demonstrates  that  important  spectral  elements  remain  identifiable  at  coarser 
resolutions.  While  we  anticipate  that  the  area  under  the  peaks  would  be  preserved,  peak  shapes  begin  to  change  at  a 
courser  spectral  resolution  of  32  or  64  cm"1  due  to  overlaps  in  the  complicated  spectral  response.  It  would  not  be 
surprising  to  note  that  the  most  robust  predictors  of  class  incorporate  best  both  biological  diversity  and  spectral  noise 
(arising  from  both  measurement  and  artifacts).  Hence,  we  anticipate  that  the  use  of  these  metrics  would  also  prove  robust 
when  spectra  are  downsampled.  Figure  2B  demonstrates  that  the  classification  accuracy  is  not  significantly  affected  until 
the  spectral  resolution  is  decreased  to  128  cm"1. 

The  result  is  indeed  surprising  as  numerous  prior  biomedical  studies  with  vibrational  spectroscopy  have  employed  4  cm"1 
to  16  cm"1  spectral  resolution.  There  are  two  important  differences  between  the  problem  here  and  a  majority  of  those 
studies.  First,  many  of  the  reported  studies  used  sensitive  spectral  analysis  tools  (e.g.  second  derivatives)  or  were  looking 
for  fine  spectral  features.  Second,  models  for  pathology  may  have  needed  more  complex  information.  Here,  we  are 
examining  a  2  class  problem  of  very  distinct  cell  types.  Hence,  the  acceptable  classification  at  very  coarse  resolutions  is 
likely  permitted  by  the  significant  biochemical  differences  between  stroma  and  epithelium  in  the  metrics  selected. 
Previous  studies  have  provided  evidence  of  clear  differences  in  IR  spectra  from  DNA-rich  tissues  such  as  epithelium  and 
RNA  and  protein-rich  tissues  such  as  stroma,14'20  especially  in  the  IR  fingerprint  region  from  500-1500  cm"1.8  We 
hypothesize  that  a  more  complex  model  with  additional  tissue  classes  would  likely  require  a  higher  spectral  resolution 
for  reasonable  classification,  but  that  this  resolution  is  not  required  to  distinguish  stroma  and  epithelium. 

A  powerful  feature  of  the  algorithm  we  employ  is  the  utilization  of  prominent  spectral  features  for  classification.  Here, 
the  features  selected  as  classification  metrics  are  not  very  sensitive  to  changes  in  spectral  resolution.36  Absorbance  values 
are  accurate  if  the  peak  full  width  at  half  maximum  (FWHM)  is  not  significantly  less  than  the  spectral  resolution.  As 
biological  materials  have  broad  and  overlapping  lineshapes,  the  condition  holds  even  for  very  coarse  resolutions. 
Therefore,  the  values  of  spectral  metrics  are  not  significantly  altered  even  if  some  details  in  the  spectrum  are  affected  at 
coarser  spectral  resolutions.  The  center  of  gravity  metrics  used  for  classification  are  particularly  robust,  as  they 
incorporate  peak  position  and  shape  and  are  not  strongly  influenced  by  peak  modifications  in  downsampled  spectra.  Care 
must  be  exercised  in  making  this  extrapolation  to  all  data  quality.  For  example,  for  poor  signal  to  noise  ratio  spectra,  the 
center  of  gravity  calculation  will  be  sensitive  to  noise. 


A  B 


Fig.  2.  Spectral  resolution  effect  on  classification.  (A)  Epithelial  spectra  obtained  by  downsampling  data  acquired  at  4  cm"1 
indicate  that  IR  spectrum  quality  degrades  appreciably  at  a  spectral  resolution  coarser  that  16  cm"1,  as  anticipated  for 
condensed  phase  biological  materials.  (B)  AUC  analysis  for  stroma  and  epithelium  segmentation  for  each  resolution 
demonstrates  a  significant  decrease  in  classification  accuracy  only  at  a  very  course  spectral  resolution  beyond  64  cm"1. 
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The  effective  classification  in  downsampled  FT-IR  images  presented  in  this  manuscript  indicates  potential  for  faster  data 
acquisition  without  significant  loss  in  classification  accuracy.  Figure  2  suggests  that  no  significant  classification 
differences  are  observed  in  images  up  to  64  cm'1.  Since  data  acquisition  time  is  estimated  to  decreased  linearly  with 
spectral  resolution,29  FT-IR  images  could  be  acquired  16  times  as  fast  without  any  loss  in  classification  performance  for 
the  two  class  model  presented  in  this  manuscript.  Again,  we  emphasize  that  the  results  are  preliminary  and  should  be 
carefully  validated.  Nevertheless,  the  idea  of  optimizing  data  acquisition  by  modeling  the  results  of  other  experimental 
conditions  is  an  important  one  that  should  be  pursued  in  practical  translation  of  these  protocols  for  clinical  use. 


4.2  Effect  of  spectral  noise  on  tissue  segmentation 

Evaluation  of  acceptable  spectral  noise  for  FT-IR  image  classification  is  important  for  efficient  data  collection.  For 
practical  applications,  it  is  advantageous  to  acquire  data  with  the  lowest  SNR  that  permits  reasonable  classification.  Raw 
data  is  acquired  with  a  peak-to-peak  noise  value  of  0.011  au,  a  root  mean  square  (rms)  noise  value  of  0.008  au,  and  an 
average  amide  I  height  of  0.328  au.  To  assess  the  impact  of  spectral  noise  on  classification  accuracy,  Gaussian  noise  is 
added  with  a  standard  deviation  of  0.001,  0.01,  and  0.1  au.  Figure  3  provides  a  qualitative  evaluation  of  histologic 
images  from  the  acquired  data  set  (Figure  3A)  and  from  the  data  sets  with  added  Gaussian  noise  (Figures  3B-D). 

These  images  indicate  that  acceptable  classification  is  achieved  when  noise  is  added  at  a  standard  deviation  of  0.001  au 
(Figure  3B),  but  that  classification  accuracy  appreciably  decreases  with  the  addition  of  noise  at  or  above  a  standard 
deviation  of  0.01  au.  This  is  expected,  since  adding  noise  at  a  standard  deviation  of  0.001  au  does  not  significantly 
change  the  FT-IR  image  data  SNR.  The  data  set  with  noise  added  at  a  standard  deviation  of  0.01  au  (Figure  3C)  produces 
a  classified  image  with  regions  of  distinguishable  stroma  and  epithelium,  although  there  are  numerous  stray  pixels  that 
are  not  correctly  classified,  similar  to  salt  and  pepper  noise.  Upon  the  addition  of  noise  of  -0.1  au,  classified  images 
become  completely  indistinguishable  (Figure  3D),  including  the  misidentification  of  many  pixels  on  the  empty  region  of 
the  slides  as  tissue.  This  loss  in  classification  accuracy  is  caused  by  an  underlying  broadening  of  spectral  metric 
distributions  for  each  class.  This  broadening  bridges  the  difference  in  metric  values.  The  overlap  in  values  in  turn 
decreases  classification  confidence  as  measured  by  the  AUC.  Hence,  we  have  used  the  AUC  as  a  reasonable  measure  of 
the  classification  accuracy  at  every  experimental  condition. 

A  plot  of  AUC  against  the  added  noise  (Figure  3E)  demonstrates  that  the  AUC  value  remains  relatively  constant  with  the 
addition  of  low  levels  of  noise.  It  then  decreases  to  a  mean  AUC  of  0.77  with  the  addition  of  noise  at  a  standard 
deviation  of  0.01  au  and  falls  to  a  mean  AUC  of  ~0.5  at  a  noise  standard  deviation  of  0.1  au.  It  is  surprising  that  the 
stroma  AUC  actually  falls  below  0.5.  Though  the  AUC  values  should  not  be  below  0.5  for  classified  images,  our 
algorithm  contains  a  pixel  rejection  step.  A  pixel  is  rejected  if  the  measured  metric  values  do  not  lie  within  the  prior 
probability  distributions.  Hence,  a  small  number  of  pixels  are  rejected  at  low  noise  levels  and  are  not  accounted. 
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Fig.  3.  Effect  of  noise  on  FT-IR  image  classification.  Classified  images  are  shown  for  (A)  raw  data,  (B)  data  with  Gaussian 
noise  added  at  a  standard  deviation  of  0.001  au,  (C)  data  with  Gaussian  noise  added  at  a  standard  deviation  of  0.01  au, 
and  (D)  data  with  Gaussian  noise  added  at  a  standard  deviation  of  0.1  au.  (E)  The  AUC  values  for  classification  with 
noise  added  at  a  standard  deviation  of  0.001,  0.01,  and  0.1  au  confirm  that  classification  accuracy  is  reasonable  with  a 
small  amount  of  additional  noise  but  unsatisfactory  in  data  with  a  noise  standard  deviation  at  or  above  0.01  au. 
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For  the  two  class  stroma  and  epithelium  segmentation  model  presented  in  this  manuscript,  an  AUC  value  of  0.77  does 
not  indicate  sufficient  classification  confidence.  We  would  expect  nearly  perfect  discrimination  of  theses  two  types  of 
tissue  since  there  are  numerous  spectral  features  that  distinguish  epithelium  and  stroma.14'20  32  '4  An  estimated 
classification  accuracy  of  0.5  for  this  model  is  equivalent  to  random  guessing  and  does  not  provide  any  information 
about  tissue  histology.  Examination  of  the  curve  in  Figure  3E  indicates  that  some  additional  spectral  noise  at  a  level  of 
0.001  can  be  present  without  loss  in  classification  accuracy  for  this  two  class  model.  We  did  not  observe  any  difference 
in  this  behavior  with  pathology  of  the  tissue.  Breast  tumor  tissue  is  often  very  heterogeneous  and  precise  pixel 
classification  is  needed  to  produce  reasonable  automated  classification  results.  Flence  these  results  represent  a  good 
starting  point  to  optimize  a  practical  protocol.  There  may  also  be  a  patient  or  clinical  setting  dependence  of  these  optimal 
operating  points  that  remains  to  be  probed.  From  the  plot,  it  is  likely  that  we  are  close  to  the  operating  point  of  a  practical 
protocol,  as  addition  of  a  small  amount  of  noise  (>0.01  au)  makes  the  classification  unstable. 

Last,  the  classification  algorithm  was  optimized  using  a  noise  level  similar  to  that  of  the  acquired  data  set  presented  in 
this  manuscript.  Flence,  the  optimal  metric  sets  and  discriminant  function  are  obtained  for  that  noise  level.  It  would  not 
prove  surprising  if  a  de  novo  training  and  optimization  of  lower  quality  data  could  yield  similar  results.  A  de  novo 
classification  algorithm  development,  however,  is  not  guaranteed  to  produce  equivalent  results  for  the  higher  noise  cases 
and  will  fail  where  overlap  between  the  prior  distributions  is  significant  due  to  noise  broadening.  Hence,  we  believe  that 
the  conditions  found  here  are  close  to  optimal. 

4.3  Noise  reduction  with  the  MNF  transform 

In  this  manuscript,  we  have  used  an  instrument  with  a  high  performance  detector  that  has  a  low  multichannel  detection 
advantage.  FT-IR  imaging  using  large  focal  plane  array  (FPA)  detectors,  however,  is  a  promising  avenue  for  rapid  data 
acquisitions  due  to  the  large  multichannel  advantage.  Imaging  with  FPAs,  unfortunately,  often  results  in  low  signal-to- 
noise  (SNR)  data  due  to  the  poor  detector  characteristics  and  other  limitations.37  From  the  trading  rules  of  FT-IR 
spectroscopy,29  achieving  a  factor  of  n  improvement  in  SNR  would  result  in  a  increase  of  n2  in  data  collection  time.  An 
alternative  to  improve  SNR  is  to  employ  post-processing  algorithms  to  reduce  noise.  One  such  avenue  for  noise 
reduction  is  the  use  of  the  minimum  noise  fraction  (MNF)  transform.  The  MNF  transform  can  be  used  in  a  mathematical 
procedure  to  remove  uncorrelated  contributions  from  the  spatial  and  spectral  domains.  First,  a  forward  transform  is  used 
to  perform  a  factor  analysis  and  re-order  spectral  data  in  the  order  of  decreasing  SNR.  The  MNF  calculation  is  a  two-step 
process.  A  noise  covariance  matrix  is  estimated  and  used  to  decorrelate  and  rescale  the  noise  in  the  data.  Subsequently,  a 
standard  PCA  performed  on  the  noise-whitened  data.  A  second  step  is  to  select  only  those  factors  that  correspond  to  a 
sufficiently  high  SNR  by  examining  the  eigenvalue  images.  The  first  few  eigenvalue  images  generally  correspond  to 
higher  SNR  values  and  contain  most  of  the  useful  information.  Noise  reduction  is  achieved  by  suppressing  the  later 
factors  corresponding  largely  to  noise  or  zero-filling  components  and  inverse  transforming  the  data.  A  noise  reduction  by 
a  factor  greater  than  5  could  be  achieved  by  this  technique  if  the  initial  SNR  is  sufficiently  high.38,39  Though  the  utility  of 
this  method  is  demonstrated  for  IR  imaging,40  its  use  has  not  been  widespread.  Further,  the  use  of  MNF  transformed  data 
for  tissue  classification  has  not  been  attempted. 

We  propose  to  use  the  MNF  transform  route  as  a  method  for  fast  data  acquisition  without  loss  in  classification  accuracy. 
The  protocol  involves  rapid  data  collection  at  a  low  SNR,  followed  by  application  of  MNF  transform  for  noise  reduction. 
Classification  is  then  performed  on  these  noise-reduced  images.  It  must  be  noted  that  the  gain  here  is  through 
computational  techniques  and  does  not  involve  changes  in  instrumentation  hardware  or  data  acquisition  time.  A 
secondary  advantage  that  may  arise  is  that  decreasing  the  variance  in  spectral  data  could  also  decrease  the  biologic 
variance  in  the  data  and  should  improve  separation  of  tissue  classes.  Excessive  image  noise  will  broaden  spectral  metric 
distributions  for  each  class,  which  increases  the  error  associated  with  each  metric  and  decreases  classification 
confidence.  Therefore,  if  the  metric  distribution  mean  values  for  each  class  are  sufficiently  different  decreasing  noise 
will  decrease  the  area  of  metric  distribution  overlap  and  improve  segmentation  confidence. 

The  impact  of  noise  reduction  on  classification  is  demonstrated  in  Figure  4.  The  MNF  transform-based  protocol  is 
applied  to  the  acquired  data  set  and  the  data  sets  with  Gaussian  noise  added  as  discussed  in  the  previous  section. 
Classified  images  are  displayed  for  each  noise  level  after  MNF  transfonn-aided  noise  reduction  (Figures  4A-D).  The 
AUC  values  for  the  MNF  transformed  image  sets  are  compared  with  the  AUC  values  for  noisy  images  (Figure  4E). 
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Fig.  4.  Improvement  in  automated  FT-IR  image  classification  with  the  application  of  the  MNF  transfonn.  Classified  images 
from  MNF  transformed  FT-IR  images  are  shown  for  (A)  raw  data,  (B)  data  with  Gaussian  noise  added  at  a  standard 
deviation  of  0.001  au,  (C)  data  with  Gaussian  noise  added  at  a  standard  deviation  of  0.01  au,  and  (D)  data  with 
Gaussian  noise  added  at  a  standard  deviation  of  0.1  au.  (E)  Comparing  AUC  values  for  original  FT-IR  images  and 
MNF  transformed  FT-IR  images  demonstrates  that  classification  improves  with  noise  reduction,  especially  when  the 
noise  has  a  standard  deviation  of  0.01  -  0.1  au. 


Evaluation  of  classified  images  and  AUC  values  indicates  that  the  MNF  transform  improves  classifier  performance  for 
each  image.  Given  that  the  classification  accuracy  was  very  high,  the  effects  of  MNF  transform  are  significant  only  when 
the  noise  level  degrades  the  original  data.  Nevertheless,  it  can  be  seen  from  the  figure  that  the  high  accuracy  is  recovered 
for  an  order  of  magnitude  increase  in  data  noise.  Therefore,  application  of  the  MNF  transform  on  data  acquired  with 
these  noise  distributions  will  make  a  significant  difference  in  classifier  performance.  Specifically,  we  can  acquire  data 
with  a  noise  standard  deviation  of  0.01  au  and  provide  accuracy  levels  that  are  comparable  to  those  obtained  in  our 
measurements  of  lower  noise.  This  finding  is  significant  in  that  noise  levels  of  0.01  au  are  commonly  obtained  in  rapidly 
acquired  FT-IR  imaging  data  sets  with  large  array  detectors.  Further,  since  the  classification  accuracy  seems  to  be  little 
affected  by  spectral  resolution,  we  can  anticipate  that  it  will  be  little  affected  by  the  choice  of  an  apodization  function 
and  other  minor  sources  of  error  for  a  reasonable  spectral  resolution.  Flence,  we  contend  that  the  protocol  developed  here 
would  be  well-suited  to  rapid  imaging  with  large  array  detectors. 


5.  CONCLUSIONS 

Recent  developments  in  FT-IR  imaging  and  data  processing  facilitate  new  applications  for  this  technology.  In  this 
manuscript,  we  report  an  initial  application  in  automating  histopathology  of  breast  tissue.  Supervised  segmentation  of 
breast  stroma  and  epithelium  in  FT-IR  images  is  presented  and  nearly-perfect  classification  accuracy  is  estimated.  The 
impacts  of  spectral  resolution  and  noise  on  image  classification  are  evaluated.  Results  in  this  paper  demonstrate  that 
spectral  resolution  can  be  decreased  16-fold  without  loss  in  classification  accuracy.  The  classification  algorithm  is  more 
sensitive  to  noise,  but  noise  reduction  with  the  MNF  transform  can  improve  classification  accuracy  while  decreasing  the 
time  required  for  data  collection.  This  evaluation  of  the  impact  of  experimental  parameters  on  classification  accuracy 
represents  a  first  step  in  developing  a  practical  protocol  for  rapid  and  automated  histopathology. 
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Midinfrared  (IR)  microspectroscopy  is  widely  employed 
for  spatially  localized  spectral  analyses.  A  comprehensive 
theoretical  model  for  the  technique,  however,  has  not 
been  previously  proposed.  In  this  paper,  rigorous  theory 
is  presented  for  IR  absorption  microspectroscopy  by  using 
Maxwell’s  equations  to  model  beam  propagation.  Focus¬ 
ing  effects,  material  dispersion,  and  the  geometry  of 
the  sample  are  accounted  to  predict  spectral  response 
for  homogeneous  samples.  Predictions  are  validated 
experimentally  using  Fourier  transform  IR  (FT-IR)  mi- 
crospectroscopic  examination  of  a  photoresist.  The  results 
emphasize  that  meaningful  interpretation  of  IR  micro- 
spectroscopic  data  must  involve  an  understanding  of  the 
coupled  optical  effects  associated  with  the  sample,  sub¬ 
strate  properties,  and  microscopy  configuration.  Simula¬ 
tions  provide  guidance  for  developing  experimental  meth¬ 
ods  and  future  instrument  design  by  quantifying  distortions 
in  the  recorded  data.  Distortions  are  especially  severe  for 
transflection  mode  and  for  samples  mounted  on  certain 
substrates.  Last,  the  model  generalizes  to  rigorously 
consider  the  effects  of  focusing.  While  spectral  analyses 
range  from  examining  gross  spectral  features  to  assessing 
subtle  features  using  advanced  chemometrics,  the  limita¬ 
tions  imposed  by  these  effects  in  the  data  acquisition  on 
the  information  available  are  less  clear.  The  distorting 
effects  are  shown  to  be  larger  than  noise  levels  seen  in 
modem  spectrometers.  Hence,  the  model  provides  a 
framework  to  quantify  spectral  distortions  that  may  limit 
the  accuracy  of  information  or  present  confounding  effects 
in  microspectroscopy. 

Infrared  (IR)  absorption  spectroscopy  has  been  coupled  to 
microscopy  in  various  configurations  for  over  50  years.1  The 
modem  coupling  of  an  interferometer  with  a  microscope  and 
mapping  stage  has  enabled  raster  recording  of  Fourier  transform 
infrared  (FT-IR)  spectra,* 1 2  considerably  accelerating  the  numbers 
of  studies  and  scope  of  analysis  by  making  instmmentation 
practical.3  Numerous  applications  have  been  reported,  for  ex¬ 
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ample,  in  materials  science,4  forensics,5 *  and  biomedical  research.6,7 
For  a  number  of  reasons,  these  raster  mapping  systems  are  best 
utilized  for  point  examination  of  specific  sample  areas.8  Signifi¬ 
cantly  higher  imaging  speeds  and  practical  wide-held  imaging  can 
now  be  routinely  attained  by  FT-IR  microspectroscopy  with  focal 
plane  array  (FPA)  detectors.9,10  Hence,  one  can  consider  the 
current  state  of  mid-IR  microscopy  to  consist  of  two  diverging 
modes.  In  the  first,  point  microspectroscopy  is  conducted  on  small, 
homogeneous  domains.  The  data  acquisition  is  often  guided  by 
visible-band  microscopy  that  is  parfocal  and  colinear  with  the  IR 
beam  in  the  instrument;  this  mode  is  called  point  microscopy. 
The  second  major  modality  utilizes  array  detectors  to  measure 
across  large  areas  of  samples,  with  a  spectrum  recorded  for  each 
of  tens  to  thousands  of  pixels;  this  mode  is  called  imaging.  The 
two  modes  are  related  and  collectively  termed  microspectrosopy, 
in  that  both  use  focusing  to  collect  spectra  from  small  regions. 

Ideally,  FT-IR  microspectroscopy  can  be  thought  to  be  an 
extension  of  IR  spectroscopy  localized  to  specific  points  in  the 
sample.  However,  as  FT-IR  microspectroscopy  is  currently  prac¬ 
ticed,3  this  description  is  not  accurate.  The  geometry  of  the  sample 
boundaries,  the  morphology  within  the  sample,  the  surrounding 
media,  and  the  imaging  optics  all  affect  the  measurements.  In  any 
given  data  set,  the  net  contribution  of  all  of  these  effects  is 
observed,  so  that  the  spectra  generally  differ  from  the  spectral 
response  of  the  bulk  material  in  the  sample.  Previous  analyses  of 
spectral  differences  between  bulk  and  microscopy  data  have 
focused  on  the  effects  of  stray  light,  the  effects  of  oblique  incidence 
on  corrections  to  Beer’s  law11  and  orientation  measurements.4 
Reports  of  optical  distortions  in  FT-IR  imaging  have  focused  on  the 
role  of  interfaces,12  on  scattering  at  an  edge13  and  on  scattering  by 
the  sample.14  Distortions  in  a  reflection-absorption  (transflection) 
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measurement  geometry  have  received  particular  attention,15'14  e.g., 
a  transformation  procedure  to  correct  a  dispersive  phase  error 
has  been  proposed.13  However,  no  study  in  the  literature  has 
rigorously  addressed  the  cause  of  apparent  spectral  artifacts  and 
morphological  distortions  from  first  principles  as  a  function  of  the 
microscope  design  and  sample  parameters.  This  situation  is  in 
sharp  contrast  to,  for  example,  fluorescence  microscopy,  where 
the  theory  is  highly  sophisticated  and  numerical  corrections  to 
the  data  can  be  confidently  made.16-20 

Considerable  care  must  be  taken  in  applying  the  methods  of 
analysis  from  visible  microscopy  to  infrared  microspectroscopy. 
Fluorescent  emissions  from  distinct  positions  within  the  sample 
are  uncorrelated,  for  example,  allowing  relatively  simple  modeling 
of  image  formation.  In  the  visible  and  near-IR  spectral  regions, 
samples  typically  exhibit  weak  and/ or  broad  absorbance  profiles 
and  the  dominant  intrinsic  optical  process  is  scattering.  Hence, 
the  usual  approach  is  to  describe  the  sample  as  a  collection  of 
nondispersive  scattering  inhomogeneities.  In  the  mid-IR,  however, 
fundamental  vibrational  modes  of  molecular  species  are  resonant 
with  the  optical  frequencies  of  the  incident  radiation.  These 
resonances  lead  to  sharp  and  strong  absorption  features  that,  of 
course,  form  the  very  basis  of  spectroscopy.  As  a  consequence, 
the  imaginary  (absorptive)  part  of  the  refractive  index  is  significant 
and  the  real  part  of  the  index  undergoes  a  large  anomalous 
dispersion.21  It  is  this  interplay  of  absorption  (the  contrast 
mechanism  in  IR  spectroscopy),  anomalous  dispersion,  and  optical 
energy  transport  that,  in  part,  leads  to  complications  in  recording 
and  understanding  of  data. 

In  this  manuscript,  rigorous  optical  theory  is  developed  for  IR 
microspectroscopy.  The  analysis  will  enable  an  understanding  of 
the  relationship  between  properties  of  the  sample  and  recorded 
data  and  will  enable  quantitative,  instrument  independent,  and 
sample-geometry  independent  data  interpretation.  While  the  scope 
is  limited  to  point  microscopy  of  samples  with  simple  layered 
structure  (i.e.,  no  transverse  variation)  in  this  manuscript,  it  is 
demonstrated  that  significant  spectral  differences  from  bulk 
measurements  and  significant  spectral  distortions  may  arise. 
When  nontrivial  transverse  sample  structure  or  morphology  is 
considered,  the  situation  becomes  more  complicated  and  that  case 
is  addressed  in  the  follow-up  article.22  Hence,  this  manuscript 
serves  both  to  help  in  understanding  the  sample-instrument 
effects  for  homogeneous  samples  and  as  a  basis  for  further 
development  of  IR  microspectroscopy  theory  for  complex  sample 
morphologies. 

The  following  sections  first  describe  the  development  of  a 
mathematical  model  for  point  microspectroscopy.  A  planewave 
solution  of  Maxwell’s  equations  is  found  for  the  sample-instrument 
system,  and  this  solution  is  used  to  construct  a  focused-field 
solution.  Next,  numerical  simulations  are  presented  to  systemati- 
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Figure  1.  Illustration  of  focusing  transmission  and  transflection 
optics.  Cassegrains  are  used  to  focus  light  illuminating  the  sample 
and  to  collect  the  light  to  be  detected.  Loci  of  constant  ray  length  are 

illustrated  ( - )  and  represent  optical  phase  fronts.  The  locus 

above  the  upper  Cassegrain  can  be  regarded  as  an  entrance  pupil, 
and  the  loci  below  the  upper  Cassegrain  can  be  regarded  as  an  exit 
pupil.  Note  that  for  notational  consistency,  the  illuminating  light  is 
always  considered  to  originate  from  above  the  sample  (i.e.,  from  the 
z  direction).  The  standard  transmission  case,  where  the  sample  is 
illuminated  from  underneath,  through  the  substrate,25  may  be  modeled 
using  reciprocity24  or  by  inverting  the  sample-substrate  system,  as 
illustrated.  In  this  illustration,  and  in  the  numerical  simulations  that 
follow,  the  objective  and  condenser  Cassegrains  are  assumed  to  be 
matched,  although  the  theory  presented  is  general  and  can  account 
for  mismatched  Cassegrains. 

cally  examine  the  effects  of  focusing,  dispersion,  and  sample 
structure.  The  model  is  also  experimentally  validated  on  a 
benchmarking  sample. 

THEORETICAL  MODEL 

Two  generic  optical  systems  for  microspectroscopy  are  il¬ 
lustrated  in  Figure  1.  The  condensing  optics  focus  light  onto  a 
sample  supported  by  a  planar  substrate.  The  sample  is  assumed 
to  be  a  layered  medium  without  transverse  structure.  The  resulting 
planar  geometry,  with  transverse  translational  invariance,  admits 
a  relatively  simple  solution  of  Maxwell’s  equations,23  and  boundary 
conditions  can  be  used  to  specify  an  incoming  field  consistent 
with  focusing.  As  illustrated  in  Figure  1,  transmission  and 
transflection  geometries  are  considered.  While  many  IR  micro¬ 
scopes  are  bottom  illuminated  for  transmission  and  top  il¬ 
luminated  for  transflection,  here  top  illumination  is  considered 
for  both  cases,  in  order  to  align  the  analytical  treatment.  It  must 
be  noted  that  the  transmission  case  is  directionally  invariant 

(23)  Bom,  M.;  Wolf,  E.  Principles  of  Optics,  6th  ed.;  Cambridge  University  Press: 

Cambridge,  U.K,  1980. 
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for  these  samples,24  hence  there  is  no  loss  of  generality  in 
considering  top  illumination. 

The  electromagnetic  field  in  each  layer  of  the  sample- 
instrument  system  may  be  described  using  planewaves  which 
satisfy  both  the  boundary  conditions  and  Maxwell’s  equations. 
This  type  of  planewave  analysis26  is  commonly  encountered  in 
many  fields,  including  the  design  of  antireflection  coatings  and 
Fabry-Perot  interferometers.  In  contrast  to  many  such  analyses 
in  the  visible  region  of  the  spectrum,  it  is  necessary  to  consider 
both  the  real  (dispersive)  and  imaginary  (absorptive)  parts  of  the 
refractive  index  here.  The  focused  field  is  constructed  as  a  sum 
of  planewaves  incident  from  the  diversify  of  angles  dictated  by 
the  focusing  optics.  Each  planewave  can  be  propagated  through 
the  layered  sample  and  substrate  independently  in  this  construc¬ 
tion,  generalizing  the  approach  of  Torok  et  al.27  Thus,  the  response 
of  the  system  to  a  single  incident  plane  wave  is  addressed  first, 
then  the  incident  focused  field  is  described  and,  finally,  the  total 
resulting  measurement  is  predicted. 

Planewave  Solutions.  A  Cartesian  coordinate  system  is 
chosen  with  the  z  axis  perpendicular  to  the  planar  boundaries 
between  sample  layers.  Position  vectors  are  written  r  =  (x,  y,  z)T 
where  a  superscript  T  denotes  the  vector  transpose.  Optical 
fields  are  represented  by  complex  amplitudes  at  each  temporal 
harmonic  frequency  cv  where  c  is  the  speed  of  light  in  free 
space  and  v  is  the  free  space  wavenumber.  The  permittivity 
and  permeability  of  free  space  are  denoted  e0  and  u0,  respec¬ 
tively.  It  is  assumed  that  the  media  in  all  layers  are  linear, 
isotropic,  nonmagnetic  (the  relative  permeability  is  unity) ,  and 
contains  no  free  charge.  The  relative  permittivity  e(v )  or, 
equivalently,  the  real  and  imaginary  (absorptive)  parts  of  the 
refractive  index  ( n(v )  and  k(v),  respectively),  vary  from  layer 
to  layer.  A  single  complex  planewave23  is  then  described  by 
electric  and  magnetic  fields  of  the  respective  forms 

E(r,  v,  t )  =  E0  exp(i2jtvs  •  r)  exp(—  i2jrvcf)  (1) 
fe g 

H(r ,v,t)  =  .  /  — (s  x  E0)  exp(i2jivs  ■  r)  exp(—  i2jivct)  (2) 

\  i“0 

where  E0  is  the  planewave  amplitude  vector,  and  s  is  a  vector 
determining  the  direction  of  propagation  and  any  absorptive 
or  evanescent  decay  of  the  field.  The  vector  s  =  (sx,  sy,  sz)T 
obeys  the  dispersion  relation 

4  +  4  +  4  =  £(t)  =  I»0')  +  ik(v)]2  (3) 

For  convenience,  the  time  harmonic  factors  exp(-i2jtvcf)  in  eqs 
1  and  2  are  suppressed  for  the  remainder  of  this  article. 

Samples  for  infrared  microspectroscopy  are  typically  mounted 
on  a  substrate  and  are  present  in  air.  Hence,  a  homogeneous 
sample  may  be  considered  to  be  a  multilayer  structure  in  which 
the  sample,  substrate,  and  air  form  a  three  layer  system.  For 

(24)  Potton,  R.  J.  Rep.  Prog.  Phys.  2004,  67,  717-754. 

(25)  Carr,  G.  L.  Rev.  Sci.  Instrum.  2001,  72,  1613-1619. 

(26)  Yeh,  P.  Optical  Waves  in  Layered  Media,  2nd  ed.;  Wiley-Interscience: 
Hoboken,  NJ,  2005. 

(27)  Torok,  P.;  Varga,  P.;  Laczik,  Z.;  Booker,  G.  R ./.  Opt.  Soc.  Am.  A  1995, 12, 
325-332. 


convenience,  the  effects  of  atmospheric  absorption  are  ne¬ 
glected  here.  To  generalize,  the  model  system  consists  of  L 
layers,  each  parallel  to  the  x—y  plane.  In  each  layer,  the  field 
may  be  written  as  a  superposition  of  planewaves  of  the  form 
described  above,  the  so-called  angular  spectrum.23  The  electric 
field  in  the  /  th  layer,  that  is  in  the  z  interval  between  the 
boundaries  2</_1>  and  z(/)  (where  z(/)  >  z(/_1)),  is  given  by  the 
sum  over  all  components  of  the  planewave  angular  spectrum 
in  the  slab, 

Ew(tj,2,v)  =  v  J  J^2  {B(/) (sx, sy,  v)  exp^jrvs'^fc  -  z</_1))]  + 

6(/)  (sx,  sy,  v)  exp[— »2jrvs^f)(z  -  z(/))]}  x 

exp[i2nv(s,x  +  s^y)]  da,  (fy,  (4) 

where,  by  eq  3, 

=  a/[m(/)( v)  +  ik®(v)]2  -  4  -  s2  (5) 

The  principal  value  of  the  square  root  is  taken  by  definition,  so 
that  the  downward-propagating  angular  spectrum  B(/>  (sx,  sy,  v) 
and  the  upward-propagating  angular  spectrum  B(/)  ( sx ,  sy,  v) 
must  be  explicitly  distinguished  in  eq  4.  The  factor  of  v  is 
included  to  ensure  that  angular  spectra  constant  in  v  produces  a 
power  spectrum  also  constant  in  v.  Also  note  that  the  downward 
propagating  light,  described  by  B(/>  (s*,  sy,  v),  is  referenced  to 
the  upper  boundary  of  the  layer  z(/  1  *,  and  the  upward 
propagating  light,  described  by  6(/)  (.sy,  s}„  v),  is  referenced  to 
the  lower  boundary  of  the  layer  z(/). 

The  field  in  the  sample  is  determined  by  the  field  incident 
from  the  focusing  optics,  i.e.,  by  B^fo,  sy,  v).  This  field 
appears  in  eq  4  referenced  to  an  arbitrary  plane  z(0)  that  does 
not  correspond  to  a  layer  boundary  but  is  instead  chosen 
for  convenience.  Boundary  conditions  relate  the  incident 
field  to  the  field  in  each  layer  of  the  sample  and  to  the  field 
in  the  substrate.  Maxwell’s  equations  dictate  these  boundary 
conditions  and  also  require  transversality  of  the  field  in  each 
layer.  Explicitly,  Gauss’  equation  V-E(r,  v,  t )  =  0,  results 
in  the  constraints 

sxBl/\sx,sy,v)  +  SyB^\sx,sy,v)  +  sf'Bl0  (sx,  sy,  v)  =  0 

(6) 

and 

s^/](sx,sy,v)  +  (sx,  sy,  v)  -  spBf1  (sx,  sy,  v)  =  0 

(7) 

The  requirement  that  the  transverse  components  of  E(r,  v,  t)  and 
H(r,  v,  t )  are  continuous  across  layer  boundaries  couples  plane 
wave  components  with  the  same  arguments  ( sx ,  sy,  v ) ,  across 
layers  via  the  constraints 

B(f  exp  [i2jtvs^  (z(/)  -  z(/*1J)]  +  B^  = 

Bf  0  +  &f+1)  exp[-i2 -  z^)}  (8) 
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By]  ey^\i2jivs{P (z(/>  —  z(/  x))]  +  = 

fif  »  +  -  z(/H))]  (9) 

(syB^  -  s^B^)  exp[i2jtvs^  (z^  -  z^1’)]  + 

(*AW  + W)  =  (sA/+1)  -  ^1><1))  + 

(s^f J)  +  sf  «Bf «)  exp [ - i2jivslm> (z(/)  -  z(/+1))]  (10) 

(s^B®  -  sJS^)  exp[i2nvs^(z^  -  z^)]  + 

(-w  -  + 

(-Sf  «eX)  -  sjp*)  exp [ - i2nvs^} (z(/)  -  z*®)]  (11) 

For  fixed  arguments  (s*,  sy,  T)  there  are  6L  unknowns,  3L  for 
B(/)  (s*,  sy,  v)  and  3 L  for  B(/)  (sx,  sy,  v) .  The  transversality 
conditions  of  eqs  6  and  7  provide  2 L  linearly  independent 
equations  (one  pair  for  each  layer)  and  the  boundary  conditions 
of  eqs  8-11  provide  4 (L  -  1)  linearly  independent  equations  (four 
equations  for  each  boundary) .  The  remaining  degrees  of  freedom 
allow  for  the  specification  of  the  incident  (incoming)  field  at  the 
top  and  bottom  layers.  At  the  last  boundary,  the  z  =  z(i_1)  plane, 
the  field  is  assumed  to  be  strictly  outgoing,  i.e.,  the  incoming 
field  is  zero.  Thus,  it  is  required  that 

6®fr1,srv)=0  (12) 

As  a  result,  there  are  only  two  degrees  of  freedom  in  the  system, 
which  are  identified  with  the  electric  field  amplitude  of  the 
illumination. 

The  total  field  is  linear  in  the  values  of  the  illuminating  field 
Ba)  ( sx ,  sy,  v);  hence,  it  is  instructive  to  consider  as  an  example  the 
case  of  single-planewave  illumination,  as  shown  in  Figure  2.  Notice 
that  the  coherent  superposition  of  transmitted  and  reflected  fields 
produces  interference  patterns  in  the  sample  and  that  the  absorption 
in  the  sample  results  in  decaying  amplitudes  into  the  media.  These 
effects  are  also  important  in  the  case  where  the  incident  field  consists 
of  a  superposition  of  planewaves  that  form  a  focus. 

As  seen  in  Figure  2,  enforcing  eqs  6-11  results  in  a  solution 
where  transmission,  reflection,  and  interference  effects  all  play  a 
role.  However,  if  two  boundaries  are  separated  by  a  large  distance, 
the  exponential  factors  in  eqs  8-11  will  result  in  a  solution  that 
varies  rapidly  with  a  small  change  in  the  wavelength,  i.e.,  the 
interference  effects  will  change  rapidly  as  a  function  of  v.  Such  a 
situation  arises  when  light  propagates  through  a  mounting 
substrate  with  a  thickness  much  greater  than  a  wavelength.  This 
type  of  highly  oscillatory  spectral  behavior  will  not  be  resolved 
by  the  spectrometer,  meaning  that  the  interference  effects  from 
the  thick  layer  will  not  be  observed  in  the  data.  Hence,  in 
transmission  mode,  the  effect  of  the  mounting  substrate  can  be 
accurately  described  by  modeling  the  distant  substrate— air 
boundary  as  uncoupled  to  the  closely  spaced  boundaries,  i.e., 
those  associated  with  the  sample.  Thus  eqs  6-11  need  only  to 
be  solved  for  closely  spaced  boundaries  (the  air-sample  and  the 
sample -substrate  boundaries),  and  the  resulting  field  of  interest 
can  otherwise  be  propagated  through  the  distant  boundary  using 
standard  transmission  coefficients. 

Focused  Illumination.  While  the  previous  subsection  has 
illustrated  the  interaction  of  planewave  fields  with  a  sample, 
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Figure  2.  An  example  of  the  field  produced  in  a  layered  sample 
under  unit-amplitude  planewave  illumination.  The  illuminating  light  is 
incident  at  an  angle  of  45°  in  the  x-z  plane,  is  purely  y-polarized 
and  has  a  wavelength  of  10  /.im  in  free  space.  The  real  part  of  the 
complex  representation  of  the  field  is  displayed.  The  indices  of  the 
four  media  present  are,  from  top  to  bottom,  1,  1.4  +  0.05/,  1.4,  and 
1.  The  boundaries  of  the  media  are  marked  with  dashed  lines. 


microspectroscopy  involves  the  use  of  focusing  optics  to  localize 
signal  and  increase  local  throughput.  Focusing  optics  can  be 
modeled  using  geometrical  optics  techniques  such  as  ray  tracing. 
In  this  paradigm,  the  Cassegrain  arrangement  that  is  usually 
employed  for  focusing  in  the  mid-IR  maps  a  ray  on  the  entrance 
pupil  to  a  focused  ray  on  the  exit  pupil  as  illustrated  in  Figure  3. 
It  should  be  noted  that  the  locus  described  as  the  exit  pupil  will 
intersect  rays  emerging  from  the  Cassegrain  when  the  Cassegrain 
is  used  as  a  condenser  but  will  intersect  incoming  rays  when  the 
Cassegrain  is  used  for  collection  (i.e.,  as  an  objective). 

The  angular  spectrum  amplitudes  of  the  focused,  illuminating 
field,  B(1)  (sx,  sy,  v) ,  can  be  associated  with  rays  in  the  exit  pupil.28 
As  illustrated  in  Figure  3,  the  vector  elements  sx  and  sy  determine 
not  only  the  propagation  direction  of  a  focused  ray  but  also 
the  intersection  of  the  associated  ray  path  and  the  entrance 
pupil.  The  field  in  the  pupil  can  therefore  be  expressed  as  a 
vector  function  P(s„  sy,  v).  A  matrix  Cj(s„  sy,  v)  relates  P(s„  sy,  v) 
to  Ba)  (.sy,  sy,  v)  and  explicitly  accounts  for  the  optical  elements 
(i.e.,  the  Cassegrain)  in  the  system, 


B  U)  (SX,  Sy,  V)  =  Cj(Sx,  Sy,  V)  P  (S„  Sy,  V)  ^ 


(28)  Wolf,  E.  Proc.  R.  Soc.  London,  Ser.  A:  Math.  Phys.  Sci.  1959,  253,  349- 
357. 
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Figure  3.  An  illustrative  ray  path  through  a  Cassegrain  system. 
Mirrors  (heavy  lines)  reflect  rays  parallel  to  the  z  axis  at  the 
entrance  pupil  to  rays  at  the  exit  pupil  which  are  directed  to  the 
focal  point.  The  vector  s  gives  the  propagation  direction  for  a  ray 
and,  for  an  aplanatic  system,  the  transverse  component  of  this 
vector  (sx  in  this  two-dimensional  figure)  is  proportional  to  the 
transverse  position  at  which  the  ray  intersects  the  entrance  pupil. 
The  ray  path  for  sx  =  0  is  represented  by  the  dotted  line;  in  a 
Cassegrain,  this  ray  does  not  contribute  to  the  focused  field. 

The  case  of  a  lossless  aplanatic  focusing  system  has  been 
addressed  by  Richards  and  Wolf,29  and  results  from  their  work 
can  be  used  to  find  Q(sx,  sy,  v) . 

The  construction  of  Q(sx,  sy,  v )  is  most  easily  accomplished 
by  defining  polarization  basis  vectors  before  and  after  the 
Cassegrain,  namely,  the  transverse-electric  (^-polarized)  and 
transverse-magnetic  (/>polarized)  vectors.  Assuming  that  the 
Cassegrain  (s)  is  in  free  space,  these  vectors  are 


v'  =  vs  =  1  (~s  s„  0)T  (14) 

<4  +  4 

vp  =  ,  1  (~sx,  -sy,  0)T  (15) 

<4  +  4 


1  = 


<4  +  4 


(-V®>  -SyS^,4  +  SP 


(1)  „2 


(16) 


where  a  prime  indicates  a  vector  on  the  exit  pupil  side  of  the 
Cassegrain  and  no  prime  indicates  the  entrance  pupil  side. 
Since  the  focusing  is  performed  in  free  space  and  only 
propagating  waves  are  produced,  sx  +  sy  <  1,  and  sx,  sy,  and 
s®  are  all  real. 

The  field  on  the  exit  pupil  P'(s*,  sy,  v)  can  be  found  by 
mapping  each  ray  through  the  focusing  optics  and  correctly 
accounting  for  conservation  of  energy.29  With  neglect  of  the 
constant  phase  factors, 


P'Oyfy  v)  =  <s^[v'vj  +  v'pvJ]P  (sx,  sy,  v )  (17) 

The  field  on  the  exit  pupil  can  then  be  used  to  determine  the 
resulting  angular  spectrum28 

(29)  Richards,  B.;  Wolf,  E.  Proc.  R.  Soc.  London,  Ser.  A:  Math.  Phys.  Sci.  1959, 
253,  358-379. 
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^P  '(sx,sy,v) 


where  £  is  the  focal  length  of  the  Cassegrain.  The  description  of 
the  focusing  optics  then  takes  the  form, 


C,{svsy,v)  =  fs(sx,sy,  v)v;vj  +  fp(sx,sy,v)Vp\J 
and  in  the  lossless  aplanatic  case, 

_  £ 

fg  (Sx,  Sy,  V)  fp  (S^,  Sy,  V)  00  , - 

K1’ 


(19) 


(20) 


More  generally,  fs(sx,  sy,  v)  and  fp(sx,  sy,  v)  can  be  modified  to 
capture  losses,  aberrations,  and  the  central  obstruction  in  the 
Cassegrain.  Note  that  it  is  implicit  in  this  treatment  that  the 
illumination  reference  plane  z(0)  is  the  focal  plane  for  a  focus 
formed  in  free  space.  It  can  also  be  seen,  from  eq  19  that 
B(1)  (sx,  sy,  v)  obeys  the  transversality  condition  of  eq  6.  Examples 
of  focused  angular  spectra,  with  the  central  Cassegrain  obstruction 
included,  are  shown  in  Figure  4. 

In  free  space,  the  illuminating  angular  spectrum  B(1)  (s*,  sy,  v) 
completely  defines  the  field.  The  presence  of  the  layered 
sample  alters  the  field  in  a  manner  that  may  be  calculated  for 
each  planewave  component  separately,  as  described  above.  The 
resultant  focused  field  is  then  found  by  summing  the  planewave 
contributions  in  the  resulting  angular  spectra.  An  example  of 
a  field  focused  into  a  layered  sample  is  shown  in  Figure  5. 

The  analysis  to  this  point  has  addressed  a  planewave  normally 
incident  on  the  entrance  of  the  condenser  Cassegrain.  At  close  to 
normal  incidence,  a  slightly  off-axis  illumination  results  in  the  field 
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Figure  4.  Normalized  angular  spectra  B(1>(sx,  sy,  v)  resulting  from  a 
y-polarized  planewave  on  the  entrance  pupil.  The  (a,d,g)  x  compo¬ 
nents,  (b,e,h)  y  components,  and  (c,f,i)  z  components  of  sy,  v) 

are  plotted  separately.  Focusing  numerical  apertures  (NAs)  of  (a-c) 
0.9,  (d-f)  0.5,  and  (g-i)  0.2  are  illustrated,  and  in  each  case  the  NA 
of  the  central  obstruction  is  20%  of  the  total  NA.  Note  that  for  large 
apertures,  the  y-polarized  field  on  the  entrance  pupil  produces 
significant  x-  and  z-directed  fields  on  the  exit  pupil.  In  transflection 
mode,  one-half  of  the  apertures  above  would  be  used  for  illumination, 
with  the  other  half  reserved  for  collection. 


(22) 


P(sx,sy,  v)  =  PqCs^Sj,,  t)  expt-^jrv^  +  s,y0)]  (21) 


Q  (sx,  sy,  v )  =  CT(sx,  sy,  v)  B (L)  (sx,  sy,  v ) 


at  the  entrance  pupil,  where  x0  and  y0  determine  the  inclination 
of  the  beam  and  the  normally  incident  field  is  Pofc,  sy,  v). 
Carrying  the  illumination  of  eq  21  through  eqs  13  and  4  shows 
that  the  inclination  will  have  the  effect  of  spatially  displacing  the 
focused  field  by  r0  =  (x0,  y0,  0)T.  In  this  manner,  the  angle  of 
incidence  of  light  on  the  entrance  pupil  governs  the  transverse 
position  of  the  focused  field.  The  use  of  aplanatic  optics  gives 
minimal  distortions  in  the  translated  field.23 

In  widefield  imaging,  light  is  incident  on  the  illumination  Cas¬ 
segrain  at  a  range  of  angles  simultaneously.  Fields  associated  with 
distinct  illumination  angles  are  generally  statistically  uncorrelated, 
meaning  that  each  can  be  considered  individually.  The  resultant 
intensities  on  the  detector  (see  the  following  subsection)  sum, 
without  interfering,  in  the  process  of  data  collection.  Similarly,  for 
unpolarized  illumination,  an  ^-polarized  illumination  field  and  a 
y-polarized  illumination  are  present  simultaneously.  Each  of  these 
can  also  be  analyzed  independently  and  the  measured  intensity  of 
each  summed  (an  incoherent  sum)  to  give  the  total  signal. 

Detection.  The  field  at  the  detector  may  be  related  to  the  field 
emerging  from  the  sample  in  much  the  same  way  that  the 
illuminating  field  is  found  from  the  field  in  the  entrance  pupil.  In 
transmission  mode,  the  field  exiting  the  Cassegrain  objective, 
denoted  Q(sx,  sy,  v),  is  dependent  on  the  emerging  angular 
spectrum  B®(s*,  sy,  v).  Similar  to  eq  13, 


In  the  transflection  mode,  the  field  exiting  the  sample  is  the 
upward  propagating  field  determined  by  the  angular  spectrum 
B(1)  (sx,  sy,  v)  and 

Q(sx,  Sy  v)  =  CR(sx,  Sy  f')6(1)  (sx,  Sy  v)  (23) 

As  with  the  illumination  matrix  Q(sx,sy,v),  the  matrices 
CT(s„  sy,  v)  and  CR(sx,  sy,  v)  describe  the  focusing  optics  for  each 
case.  In  transmission,  CT(sx,  sy,  v)  describes  focusing  of  the 
downward  propagating  light  transmitted  through  the  sample 
and  substrate,  while  in  transflection  CR(sx,  sy,  v )  describes  the 
focusing  of  the  upward  propagating  reflected  light. 

The  angular  spectra  emerging  from  the  sample  define  the  field 
incident  on  the  objective  Cassegrain.  Similar  to  eq  18,  this  incident 
field  can  be  expressed  as  Q'(sx,sy,r)  =  B (L)(sx,  sy,  r)sP/£  in 
transmission  mode  and  as  Q'(s„  sy,  v )  =  B(1)(s*,  sy,  v)s®/£  in 
transflection  mode.  The  mapping  of  the  diverging  field 
Q'(sx,  sy,  v)  through  an  ideal  objective  Cassegrain  to  the 
collimated  field  Q  (s„  sy,  v)  obeys  the  same  relation  as  was  given 
in  the  illumination  case,  i.e.,  eq  17.  Assuming  that  the  last  layer 
of  the  system  is  free-space,  the  transmission  mode  relation 
CT(sx,  Sy,  v)  may  therefore  be  represented  compactly  in  the 
bases  defined  in  eqs  14—16, 
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Figure  5.  The  focused  field  magnitude  IE(x,  y,  z,  v)\  for  v  =  1000 
crrr1  (a  free  space  wavelength  of  10  ym),  the  four-layer  object  of 
Figure  2  and  the  angular  planewave  spectrum  of  Figure  4d-f  plotted 
on  a  normalized  scale.  The  free  space  focal  plane  is  at  z<0)  =  0.  One 
transverse-axial  (x-z)  section  is  plotted  at  y  =  0  and  three 
transverse-transverse  (x-y)  sections  are  plotted  at  z=  -15  ym,  z 
=  0,  and  z=  15  ym.  In  this  example,  the  Cassegrain  pupil  is  filled, 
i.e.,  there  are  no  apertures  limiting  the  width  of  the  illuminating  beam 
before  the  focusing  optics. 


CT(sx,  Sy  V )  =  f'(sx,  Sy  v)vs(v')T  +  f' (sx,  Sy  i 0 vp (v^ ) 1  (24) 

where 


fs'(sx,sy,v)  =fp(sx,sy,v)  °c 


(25) 


for  the  ideal  Cassegrain  objective.  Notice  that  for  transmission 
with  no  sample  or  substrate  (the  empty  instrument  case), 
B<L,(Sj;,  sy,  v )  =  B(1)(s*,  sy,  v) ,  leading  to  the  result  Pfc,  sy,  v )  = 
Q  (.sy,  sy,  v).  This  is  to  be  expected;  with  no  sample  or  substrate, 
propagation  through  the  focusing  system  has  no  net  effect. 

In  transflection  mode  a  similar  relation  holds, 


CR(SX,  Sy  V )  =  fs'(Sx,  Sy  V)Vs(v')T  +  fp  (S„  Sy  V)\ p^ 

(26) 

where  fs'(sx,sy,v)  and  fp'(sx,sy,v)  are  as  in  eq  25  for  ideal 
collection,  and  the  reflected  s-  and  p- polarized  basis  vectors  are 
given  by  the  expressions 


t  ^  L—  (~Sy  sx,  0)T  (27) 

K  +  4 

vp  -  -^—(sx,sy,0)T  (28) 

K  +  sy 

%  =  i — - — (s/z\s/x\4  +  $T  (29) 

k+s] 
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As  before,  a  prime  indicates  a  vector  on  the  sample  side  of  the 
Cassegrain. 

To  achieve  magnification,  the  field  Q(s*,  Sy,  v)  is  focused  on 
to  a  detector  by  a  low-angle  focusing  system.  For  an  imaging 
system  with  magnification  of  M,  the  field  on  the  detector  plane 
is  given  by 


D (x,y,v)  =  jjf  fR2Q(sx,sy,v)  exp(i2irvs^zA)  x 


exp 


ds*  ds,,, 


(30) 


where  T0(y)  describes  the  transmission  and  reflection  effects 
for  the  experimental  setup  without  the  sample.  The  recorded 
absorbance,  A( y),  is  obtained  from  the  normalized  spectrum, 


A(v)  =  -log 


m 


ii0(y) 

_  4jtvk(v)b 


2.303 


-  logi 


Ts(v) 


T0(y) 


The  molar  absorptivity  is  defined  as 


(35) 


where  s®  is  calculated  as  in  eq  5  but  with  magnified  values  sJM 
and  sy/M  instead  of  sx  and  sy,  and  zA  is  the  offset  between  the 
detector  plane  and  the  focal  plane.  The  focusing  described  in 
eq  30  is  of  the  same  form  as  in  eq  4  but  with  the  focused  spectrum 
corresponding  to  Q(s„  sy,  v),  the  field  incident  on  the  detector 
focusing  optics.  This  is  justified  for  the  focusing  onto  the 
detector,  as  the  fields  on  the  entrance  and  exit  pupils  of 
the  low-angle  focusing  system  are  approximately  equal,  i.e., 
the  focusing  tensor  C(sx,  sy,  v)  is  modeled  as  the  identity 
operator.  Note  that  D  (x,  y,  v),  the  field  incident  on  the  detector, 
lies  in  the  x—y  plane  as  Q  (.sy,  sy,  v)  is  spanned  by  vs  and  vp  (see 
eqs  14,  15,  24,  and  26). 

The  signal  measured  by  an  optical  detector  is  proportional  to 
the  intensity  of  the  field  integrated  over  the  detector  area,  i.e., 


I0>)  =  f  fQ  |D(*,y,  v)|2(kdy  (31) 

where  Q  is  the  detector  area.  If  the  detector  area  is  large 
compared  to  the  focal  spot,  then  the  region  of  integration  above 
can  be  replaced  with  the  entire  x-y  plane.  In  this  case, 
Parseval’s  theorem  can  be  applied  to  calculate  the  data  in  terms 
of  the  collimated  beam  exiting  the  Cassegrain  objective,  i.e., 
the  data  become  independent  of  the  focusing  on  to  the  detector 
with 


1  (v)  =  /  fR2 1 Q  (sx,  sy,  v)  | 2  ds*  ds,,  (32) 

Here  the  recorded  signals  are  simply  the  total  intensity  of  the 
collimated  beam  emerging  from  the  collection  Cassegrain. 

Relating  Theory  to  Current  Practice.  The  molecular  inter¬ 
pretation  of  recorded  data  in  microspectroscopy  typically  follows 
that  of  bulk  spectroscopy,  in  which  the  recorded  signal  intensity 
is  often  interpreted  using  the  expression 


Is(v)  =  |  P  (v)  | 2  Ts  (v)  exp  [ — Anvk  (v)b]  (33) 

Here  P(v)  is  the  illumination  field  amplitude,  b  is  the  effective 
path  length  (nominally,  the  sample  thickness  for  transmission  and 
twice  the  sample  thickness  for  the  transflection  measurements), 
and  Ts(v)  describes  a  net  transmission  or  reflection  coefficient. 
To  calculate  absorption  spectra,  a  background  measurement 
is  first  obtained, 


a(v) 


4jivk(v) 
2.303 p 


(36) 


where  p  is  the  concentration  of  the  absorbing  species.  Finally,  in 
the  ideal  case  where  the  sample-free  transfer  function,  T0(v),  is 
equal  to  the  transfer  function  with  the  sample,  Ts(v),  Beer’s 
law 


A(v)  =  a(v)bp  (37) 

can  be  recovered  from  eq  35. 

With  comparison  of  eqs  35  and  37,  it  may  be  recognized  that 
the  recorded  absorbance  spectrum  should  be  corrected  for  optical 
effects  to  recover  analytically  meaningful  spectra  that  are  inde¬ 
pendent  of  the  instrument  and  the  sample  geometry.  Comparing 
eq  33  with  the  rigorous  model  in  the  previous  section,  it  should 
be  noted  that  the  simple  model  does  not  fully  take  into  account 
the  structure  of  the  object  (beyond  the  path  length  b)  nor  the 
real  part  of  the  refractive  index.  These  two  factors  are  known  to 
lead  to  interference  and  dispersion  effects  in  bulk-sample 
spectroscopy.30,31  Restated,  in  the  simple  model  the  transmission 
or  reflection  coefficient  Ts(v)  is  considered  to  be  independent 
of  the  sample  geometry  and  the  properties  of  the  sample  and 
substrate.  The  model  of  eq  33  also  does  not  account  for  the 
angle  (s)  of  illumination  and  detection,  this  can  be  particularly 
important  in  microspectroscopy,  as  focusing  results  in  simulta¬ 
neous  illumination  with  waves  of  many  incidence  angles.  The 
impact  of  neglecting  these  factors  on  the  data  is  apparent  in  the 
simulations  that  follow.  The  various  effects  leading  to  spectral 
distortions  are  identified  and  systematically  quantified  through 
the  use  of  the  developed  model. 

In  the  rest  of  this  article,  the  model  described  is  first 
experimentally  validated  using  a  benchmarking  sample.  The 
effects  of  the  focusing  optics  of  the  imaging  system  are  then 
isolated  in  simulation  by  considering  a  hypothetical  idealized 
sample  that  eliminates  the  sample-induced  distortions  described 
by  the  second  term  in  eq  35.  Transmission  and  transflection 
geometries  using  common  substrates  are  then  simulated  and  it 
is  seen  that  transflection  measurements  in  particular  are  suscep¬ 
tible  to  sample-induced  distortions.  These  distortions  are  seen  to 
be  exacerbated  if  a  substrate  of  intermediate  index  is  used.  Further 
sample-induced  distortions  are  predicted  if  an  air  gap  is  present 
between  the  substrate  and  the  sample.  Finally,  the  correspondence 
between  a  simplified  single-ray  model  and  the  fully  focused  model 
is  examined. 


!„(*)  =  |P(v)|  %(v) 


(30)  Allara,  D.  L.;  Baca,  A;  Pryde,  C.  A  Macromolecules  1978,  11, 1215-1220. 
(34)  (31)  Zhang,  Z.  M.  J.  Heat  Transfer  1997,  119,  645-647. 


3480  Analytical  Chemistry,  Vol.  82,  No.  9,  May  1,  2010 


Figure  6.  The  (a)  complex  refractive  index  of  toluene32  and  (b)  the 
magnitude  of  the  normal-incidence  complex  transmission  and  reflec¬ 
tion  coefficients  for  an  air-toluene  boundary.  The  Supporting  Infor¬ 
mation  includes  graphs  of  the  refractive  indices  of  the  other  materials 
considered  in  this  article. 

To  make  predictions  from  the  theoretical  model,  toluene  is 
used  as  a  homogeneous  sample  of  interest.  Toluene  exhibits 
distinct  and  clearly  identifiable  absorption  modes  of  varying 
strength  over  the  entire  mid-IR  region,  making  it  an  ideal  sample. 
In  addition,  toluene  has  a  well  characterized  complex  refractive 
index32  [shown  in  Figure  6a]  which  is  publicly  available.33 
Refractive  index  changes  and  anomalous  dispersion34  in  the 
vicinity  of  absorption  peaks  can  be  clearly  observed  [e.g.,  see  inset 
in  Figure  6a] .  This  variation  of  refractive  index,  in  both  the  real 
and  imaginary  parts,  affects  the  recorded  data.  A  simple  illustration 
of  these  effects  is  shown  in  Figure  6b,  where  the  transmission 
and  reflection  coefficients  at  an  air-toluene  boundary  can  be  seen 
to  have  structure  influenced  by  the  dispersive  real  index  profile. 

Experimental  Validation  of  the  Model.  The  model  presented 
here  was  validated  by  performing  microspectroscopy  measure¬ 
ments  on  a  well  characterized  benchmarking  sample.  The  sample 
was  designed  such  that  both  transmission  and  transflection 
measurements  would  result  in  significant  signal.  The  model 


(32)  Bertie,  J.  E.;  Jones,  R.  N.;  Apelblat,  Y.;  Keefe,  C.  D.  Appl.  Spectrosc.  1994, 
48,  127-143. 

(33)  http:/ /keefelab.cbu.ca/?page_id= 19. 

(34)  Saleh,  B.  E.  A.;  Teich,  M.  C.  In  Fundamentals  of  Photonics',  Wiley- 
Interscience:  New  York,  1991;  Chapter  5,  pp  176—179. 


should  then  be  able  to  accurately  predict  both  sets  of  data  from 
a  single  description  of  the  sample-substrate  system. 

The  sample  was  prepared  by  first  forming  a  thin  (^75  nm) 
germanium  layer  by  sputter  coating  on  a  barium  fluoride  (BaF2) 
disk.  A  common  photoresist  material,  SU-8  2000.5  (MicroChem 
Corp.,  Newton,  MA),  was  spin  coated  to  an  approximate 
thickness  of  10  fin l  and  pattern  cured  by  UV  exposure  using  a 
standard  USAF  1951  target  (Edmond  Optics,  Barrington,  NJ). 
Postcuring,  the  entire  sample  was  baked  at  95  °C  and  developed 
as  per  standard  protocols.35  A  postbake  at  150  °C  for  5  min 
was  performed  to  ensure  complete  polymerization  and  long¬ 
term  stability. 

The  sample  data  were  recorded  on  a  Varian  Stingray  system 
using  a  mid-IR  interferometer  and  microscopy  with  glass  aper¬ 
tures.  A  narrowband,  liquid  nitrogen  cooled  detector  is  used  to 
record  spectra.  Data  are  recorded  at  an  undersampling  ratio  of  2 
referenced  to  the  He-Ne  laser,  zero-filled  by  a  factor  of  2,  and 
transformed  using  Happ-Genzel  apodization.  Single  beam  spectra 
acquired  for  the  sample  (a  position  near  the  center  of  a  larger 
region  of  SU-8  and  far  from  an  transverse  features)  and  back¬ 
ground  (a  position  with  no  SU-8)  are  subsequently  converted  to 
absorbance  spectra.  Both  transmission  and  transflection  mode 
data  were  acquired  without  perturbing  the  sample. 

The  SU-8  polymer  is  the  sample  layer  to  be  characterized,  while 
the  refractive  indices  are  known  for  the  thin  germanium36  layer 
and  the  barium  fluoride37  substrate.  Background  single  beam 
spectra,  Figure  7d,  are  recorded  on  a  region  of  the  sample  without 
SU-8,  and  the  sample  single  beam  spectra,  Figure  7a,  are  recorded 
on  a  region  of  the  sample  with  SU-8.  If  absorbance  calculations 
are  performed  according  to  eq  35,  the  transmission  and  trans¬ 
flection  results,  plotted  in  Figure  7c,  are  not  consistent.  Significant 
interference  effects  are  visible,  and  peak  shapes,  locations,  and 
heights  can  be  seen  to  differ  significantly,  despite  the  fact  that 
the  measurements  were  taken  from  the  same  sample. 

To  correctly  interpret  the  data,  it  is  necessary  to  include  optical 
effects,  as  modeled  in  this  work.  As  a  first  step,  the  source 
spectrum,  |P(s^,  sy,  v)\2  was  determined  from  the  background 
measurement.  It  is  assumed  that  the  illumination  is  constant 
across  the  entrance  pupil  so  that  the  spectrum  is  not  dependent 
on  sx  and  sy.  The  numerical  aperture  of  the  Cassegrain  and  the 
Cassegrain  obstruction  were  found  to  be  best  modeled  as  0.40 
and  0.26,  respectively.  The  expected  reflection  and  transmission 
coefficients  from  the  air,  germanium,  barium  fluoride,  air 
system  were  calculated  using  the  microspectroscopy  model  and 
divided  out  (see  eq  34).  The  resulting  transmission  and  trans- 
flection  single  beam  spectra  of  the  source  are  shown  in  Figure 
7g.  Since  the  instrument  uses  different  optical  paths  for  the 
transmission  and  transflection  measurements,  these  two  source 
spectra  cannot  be  expected  to  be  equal  or  proportional.  It  can, 
however,  be  seen  that  the  source  spectral  profiles  are  qualitatively 
consistent,  which  was  not  the  case  before  transmission  and 
reflection  effects  were  considered,  see  Figure  7d. 

Once  the  source  profiles  are  established,  a  preliminary  estimate 
of  absorbance  can  be  found.  Data  were  predicted  by  modeling 
the  sample  index  as  purely  real  with  n0(v )  =  1.4017.  The  data 

(35)  Processing  Guidelines  for  SU-8  Permanent  Epoxy  Negative  Photoresist,  http:  // 
www.microchem.com/ products/ pdf/SU-82000DataSheet2000thru2015V er4.pdf. 

(36)  Barnes,  N.  P.;  Piltch,  M.  S ./.  Opt.  Soc.  Am.  1979,  69,  178-180. 

(37)  Malitson,  I.  H.  /.  Opt.  Soc.  Am.  1964,  54,  628-632. 
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(a)  Sample  Single  Beam 


(d)  Background  Single  Beam 
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(g)  Calculated  Source  Intensity 


(b)  Transmission  Prediction  (c)  Recorded  Absorbance 


Figure  7.  Experimental  data  and  quantities  used  in  the  modeling  of  the  benchmarking  sample:  (a)  the  measured  single  beam  spectra  for  the 
substrate  and  sample  (SU-8  polymer  layer),  (d)  the  measured  single  beam  spectra  for  the  substrate  alone,  (c)  the  absorbance  as  calculated 
from  the  ratios  of  the  single  beam  spectra  using  eq  35,  (g)  the  source  spectra  IP(v)l2,  as  calculated  by  compensating  the  background  spectra 
for  the  effects  of  the  substrate,  (b)  the  single  beam  transmission  spectrum  predicted  by  the  model,  compared  to  that  measured,  (e)  the  single 
beam  transflection  spectrum  predicted  by  the  model,  compared  to  that  measured,  (f)  the  residual  differences  between  the  plots  in  parts  b  and 
e,  (f)  the  complex  refractive  index  calculated  using  the  microspectroscopy  model  and  Kramers-Kronig  analysis,  (i)  the  recovered  absorbance 
of  SU-8,  as  calculated  using  the  imaginary  part  of  the  refractive  index  shown  in  part  f  and  eq  35. 


predicted  from  the  uniform  index  were  used  as  an  improved 
background  measurement  which  captures  interference  effects, 
such  as  those  seen  in  Figure  7a  but  not  in  the  original 
background  measurement  of  Figure  7d.  With  the  use  of  eq  35, 
an  estimate  of  the  imaginary  index  k(v)  can  be  found  from  the 
transmission  data.  Applying  a  Kramers-Kronig  calculation38  to 
k(v)  gives  an  estimate  of  the  variations  of  n{ v)  about  the 
underlying  constant  value  n0(v).  The  estimate  of  the  refractive 
index,  n(y)  +  ik(v),  is  then  used  to  predict  the  single  beam 
transmission  spectrum.  The  difference  between  this  prediction 
and  the  measurement  is  then  used  to  update  the  absorbance 
and  hence  the  imaginary  index  k(v).  By  iteration  of  the 
algorithmic  cycle,  (1)  update  the  absorbance  estimate  from  the 
difference  between  the  measured  and  predicted  transmission 
data;  (2)  calculate  the  complex  refractive  index  from  the 
absorbance  using  Kramers-Kronig  analysis;  (3)  calculate  the 

(38)  Kuzmenko,  A.  B.  Rev.  Sci.  Instrum.  2005,  76,  083108. 
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predicted  transmission  data  using  the  model  and  the  complex 
refractive  index  of  the  polymer,  it  is  possible  to  converge  on 
the  complex  refractive  index  of  the  polymer.39,40 

The  resulting  complex  index  is  plotted  in  Figure  7f  and  is  used 
to  predict  spectra  for  both  transmission  and  transflection  modes. 
A  good  agreement,  Figure  7b,e,h,  is  observed  between  the 
predicted  and  observed  data  for  both  transmission  and  transflec¬ 
tion.  The  errors  that  are  observed  can  likely  be  attributed  to  factors 
such  as  sample  variations,  unmodeled  elements  in  the  instrument 
optical  path,  and/or  sample  tilt.  In  Figure  7i  the  absorbance 
spectrum,  free  of  optical  effects,  as  calculated  from  the  recorded 
data  is  shown.  The  agreement  between  predictions  and  measure¬ 
ments  validates  the  model  by  demonstrating  predictive  power 
based  on  a  physical  description  of  the  sample. 

In  modeling  the  measurement  of  the  benchmarking  sample, 
the  refractive  index  of  the  cross-linked  SU-8  layer  was  estimated. 
This  estimate  shows  good  consistency  with  the  noncross-linked 


SU-8  measurements  that  appear  in  the  literature.41,42  It  should 
also  be  noted  that  the  background  value  of  the  refractive  index, 
n0(v),  the  exact  thicknesses  of  the  SU-8  layer  (12.43  pim),  and 
the  exact  thickness  of  the  germanium  layer  (79  nm)  were 
estimated  by  minimizing  the  difference  between  the  observed 
and  predicted  data.  The  resultant  values  are  consistent  with 
the  specifications  used  in  the  manufacture  of  the  sample.  It 
may  be  possible  to  further  improve  the  model  accuracy  by 
including  effects  such  as  the  nonuniform  illumination  of  the 
Cassegrain  aperture  (e.g.,  the  supports  for  the  secondary 
reflector  obstruct  a  small  portion  of  the  aperture).  Nevertheless, 
the  results  presented  here  indicate  that  the  level  of  modeling 
proposed  here  can  substantially  help  in  understanding  recorded 
data  as  well  as  optical  effects  in  IR  microspectroscopy. 

SIMULATION  AND  PREDICTION 

Instrumentation  Effects.  In  simulation,  it  is  possible  to  separate 
effects  due  to  the  sample  and  substrate  and  effects  due  to  the 
instrumentation.  Here,  this  is  first  accomplished  to  investigate  the 
dependence  of  the  measured  spectra  on  the  numerical  aperture  of 
the  imaging  system.  Sample-induced  distortions  can  result  from 
changes  in  reflection  and/or  transmission  coefficients  between  the 
background  and  sample  measurements,  an  effect  represented  in  the 
second  term  of  eq  35.  To  eliminate  these  coefficient  changes,  one 
can  consider  a  nondispersive,  weakly  absorbing  sample.  Here  the 
imaginary  part  of  the  sample  index,  k(y),  is  taken  to  be  1/100  of 
the  imaginary  part  of  the  index  of  toluene,  and  the  real  part  of  the 
refractive  index,  n(v),  is  taken  to  be  1.  Note  that  this  is  not  a  physically 
realizable  material,  as  causality  requires  that  changes  in  the  absorp¬ 
tion  must  necessarily  be  associated  with  changes  in  the  real  part  of 
the  refractive  index.43  However,  the  minimal  perturbations  of  the 
complex  refractive  index  allow  the  isolation  of  instrument-induced 
changes  in  the  data. 

A  200  /<m  thick  layer  of  the  sample  is  taken  to  be  in  direct 
contact  with  a  substrate  of  barium  fluoride  for  transmission 
measurements  and  with  a  substrate  of  gold  in  transflection 
measurements.  The  background  measurements  are  simulated 
with  only  the  substrate  present.  The  large  sample  thickness,  paired 
with  the  weak  absorption,  results  in  absorbance  data  comparable 
to  those  expected  from  an  ideal  measurement  of  a  2  fim  thickness 
of  toluene.  The  indices  of  barium  fluoride  and  gold  are  calculated 
using  published  coefficients37,44  in  a  Sellmeier  equation.  It  should 
be  noted  that  in  this  article  the  Sellmeier  equation  for  the  real 
index  of  barium  fluoride  has  been  extended  beyond  the  transmis¬ 
sion  window  in  order  to  allow  a  consistent  comparison  with 
transflection  systems  for  low  wavenumbers.  Numerical  apertures 
of  0.2,  0.5,  and  0.9  (as  in  Figure  4)  are  simulated  in  both 
transmission  and  transflection  geometries  and  the  illuminating 
light  is  taken  to  be  unpolarized.  Note  that  for  each  NA  the  central 
obscuration  produced  by  the  Cassegrain  is  taken  to  have  a  radius 
covering  20%  of  the  NA.  The  NA  of  0.5  is  similar  to  that  available 


(39)  Hawranek,  J.  P.;  Jones,  R.  N.  Spectrochim.  Acta  1976,  32A,  99-109. 

(40)  Hawranek,  J.  P.;  Neelakantan,  P.;  Young,  R.  P.;  Jones,  R.  N.  Spectrochim. 
Acta  1976,  32A,  85-98. 

(41)  Tan,  T.  L.;  Wong,  D.;  Lee,  P.;  Rawat,  R.  S.;  Patran,  A.  Appl.  Spectrosc.  2004, 
58,  1288-1294. 

(42)  Tan,  T.  L.;  Wong,  D.;  Lee,  P.;  Rawat,  R.  S.;  Springham,  S.;  Patran,  A.  Thin 
Solid  Films  2006,  504,  113-116. 

(43)  Toll,  J.  S.  Phys.  Rev.  1956,  104,  1760-1770. 


in  commercial  microspectroscopy  systems,  while  the  NAs  of  0.9 
and  0.2  provide  greater  and  lesser  comparisons. 

An  estimate  of  the  absorbance  is  found  by  evaluating  eq  35, 
and  the  results  are  displayed  in  Figure  8a.  Note  that  the 
absorbance  has  been  normalized  by  the  path  length  (in  microme¬ 
ters).  In  this  idealized  example,  a  good  agreement  between  the 
measured  absorbance  and  the  actual  absorbance  is  expected. 
However,  overestimation  of  the  absorbance  by  an  amount  that 
increases  with  the  numerical  aperture  is  predicted.  This  apparent 
violation  of  Beer’s  law  arises  because  of  the  increasing  path  length 
through  the  sample  associated  with  higher-angle  rays,  a  phenom¬ 
enon  predicted  by  Blout  et  al.11  The  procedure  of  Blout  et  al. 
accurately  predicts  the  errors  of  Figure  8a;  however,  a  more 
general  correction  procedure  must  account  for  a  coupling  between 
measurements,  sample  structure,  and  all  angles  of  incidence,  a 
set  of  phenomena  explored  further  in  the  following  simulations. 

Sample-Induced  Distortions.  Next,  consider  a  toluene 
sample  (i.e.,  with  the  index  illustrated  in  Figure  6)  on  barium 
fluoride  for  transmission  measurements  and  on  gold  for  trans¬ 
flection  measurements.  To  investigate  the  effect  of  sample-induced 
distortions,  measurements  are  simulated  for  a  variety  of  sample 
thicknesses.  The  background  measurements  are  taken  by  replac¬ 
ing  the  sample  with  a  medium  of  the  same  thickness  as  the  sample 
but  with  index  of  n  =  1.47.  These  background  measurements  are 
designed  to  represent  an  optimistic  case,  where  Fabry- Perot  type 
fringing  effects  in  the  background  cancel  similar  effects  in  the 
sample  measurement,  giving  a  relatively  good  match  between 
Ts(v)  and  T0(v)  (see  eq  35).  Hence,  experimentally  observed 
data  will  contain  additional  fringes  arising  from  purely  optical 
effects.  Various  methods  have  been  proposed  for  correction  of 
fringes.45  It  must  be  noted,  however,  that  explicitly  accounting 
for  physical  effects  is  likely  to  be  more  accurate  than  signal 
processing  methods  alone,  as  was  shown  in  Figure  7.  The 
illuminating  light  is  taken  to  be  unpolarized,  while  the  NA  of  the 
system  is  modeled  as  0.5  (with  a  central  NA  of  0.1  obscured  by 
the  secondary  Cassegrain  reflector) . 

The  simulation  results  are  shown  in  Figure  8b.  In  the 
transmission  experiments,  the  estimates  of  absorbance  are 
reasonably  accurate.  In  transflection,  however,  errors  in  peak 
position,  peak  height,  and  band  shape  are  predicted  in  the 
absorption  spectra.  Such  distortions  have  also  been  observed 
experimentally.46  As  a  consequence  of  the  dispersion  quantified 
by  the  Kramers -Kronig  relations,43  strong  absorption  peaks  are 
accompanied  by  sharp  changes  in  the  real  refractive  index  (e.g., 
see  Figure  6) .  This  results  in  a  significant  difference  between  the 
coefficients  T0(v )  and  Ts(v)  seen  in  eq  35  and  leads  to  distortions. 
Furthermore,  when  the  sample  thickness  is  on  the  scale  of  the 
wavelength,  reflected  and  transmitted  components  interfere, 
resulting  in  a  complicated  interplay  of  dispersion,  sample  geom¬ 
etry,  and  absorption.  The  differences  in  the  predicted  spectra  with 
sample  thickness  stem  from  these  phenomena. 


(44)  Ordal,  M.  A.;  Long,  L.  L.;  Bell,  R.  J.;  Bell,  S.  E.;  Bell,  R.  R.;  Alexander, 
R.  W„  Jr.;  Ward,  C.  A.  Appl.  Opt.  1983,  22,  1099-1120. 

(45)  Griffiths,  P.  R.;  de  Haseth,  J.  A.  In  Fourier  Transform  Infrared  Spectrometry, 
2nd  ed.;  Wiley-Interscience:  Hoboken,  NJ,  2007;  Chapter  11.1.3,  pp  253— 
255. 

(46)  Gunde,  M.  K;  Aleksandrov,  B.  Appl.  Spectrosc.  1990,  44,  970-974. 

Analytical  Chemistry,  Vol.  82,  No.  9,  May  1,  2010  3483 


£ 

B> 

C 

0) 

£ 

c3 

Q. 


JD 

< 


-C 

B 

c 

0) 

£ 

nj 

fc 

0) 

o 

c 

nj 

n 


.Q 

< 


(a) 


1600  1400  1200  1000  800  600 

Wavenumber  (cm-1) 


Transmission 


0.3 

0.2 

c 

0.1  2 

CC 

0  | 

-0.1 

-0.2 


Transflection 


0.3 

0.2 


o.i  -2 
gj 

0  | 


-0.1 

-0.2 


Transmission 


Transmission 


Transflection 


0.3 

0.2 


-o.i 

-0.2 

-0.3 


cn 

c 

_0) 

£  0.2 
CO 

% 

a) 


.c 
B 

c 
_a> 

£ 

"co 

£ 

d) 

o 

I  0.1 

o 

(0 

X) 

< 


T  ransflection 


0.2 


1600  1400  1200  1000 


800 


600 


1600  1400  1200  1000 


800 


0.3 


0.2  C 
o 


-0.1 


..  jJl  .  .v 

0.3 

0.2 

c 

o 

oo 

0.1 

> 

m 

D 

-0.1 


600 


(C) 


Wavenumber  (cm  ) 


(d) 


Wavenumber  (cm"') 


Figure  8.  (a)  Predicted  absorbance  for  an  idealized  thick,  low-absorption  sample,  normalized  by  the  path  length  (in  micrometers).  Data  are 
plotted  for  both  transmission  and  transflection  modalities  (and  for  a  range  of  numerical  apertures)  as  differences  from  the  ideal  absorbance 
profile.  In  transmission,  the  substrate  is  barium  fluoride,  and  in  transflection,  the  substrate  is  gold,  (b)  Predicted  path-length-normalized  absorbance 
deviations  for  a  toluene  sample  and  a  range  of  sample  thicknesses.  In  transmission,  the  substrate  is  barium  fluoride,  and  in  transflection,  the 
substrate  is  gold,  (c)  Predicted  path-length-normalized  absorbance  deviations  for  a  toluene  sample  and  a  range  of  sample  thicknesses.  In  both 
transmission  and  transflection,  the  substrate  is  germanium,  (d)  Predicted  path-length-normalized  absorbance  deviations  when  there  is  an  air 
gap  between  the  sample  and  the  substrate.  The  air  gap  thickness  is  varied,  while  in  all  cases  the  sample  thickness  is  2  /.cm.  In  transmission,  the 
substrate  is  barium  fluoride,  and  in  transflection,  the  substrate  is  gold.  The  absorbance  spectra  used  to  calculate  the  deviations  shown  here  are 
plotted  in  the  Supporting  Information. 


Sample-Induced  Distortions  for  Substrates  of  Intermedi¬ 
ate  Index.  The  appearance  of  the  dispersion  profile  (the  real  part 
of  the  refractive  index)  in  absorption  microspectroscopy  measure¬ 
ments  has  been  described. 13-15  It  was  noted  that  the  estimate  is 
more  susceptible  to  this  consequence  of  dispersion  in  transflection 
mode  or  when,  for  example,  a  higher-index  substrate  is  used  in 


transmission.  The  dispersion  influence  can  be  explained,  at  least 
in  part,  by  effects  such  as  those  predicted  in  Figure  8a.  Romeo 
and  Diem13  also  observed  a  similar  feature  at  sample  edges;  this 
phenomenon  is  investigated  in  the  follow-up  article.22 

If  both  the  transmission  and  transflection  substrates  are 
germanium  (with  background  measurements  taken  on  the  bare 
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germanium),  the  spectra  predicted  are  shown  in  Figure  8c.  The 
refractive  index  of  germanium  was  calculated  using  published 
coefficients36  for  a  Sellmeier  model,  and  all  other  simulation 
parameters  are  the  same  as  for  Figure  8b.  The  germanium 
substrate  can  be  seen  to  give  seemingly  confounding  results;  the 
transflection  spectra  have  severe  distortions  including  negative 
values  of  absorbance  that  are  not  physically  realizable,  while  the 
transmission  spectra  have  distorted  band  shapes  and  amplitudes. 
Hence,  it  is  clear  that  the  high  index  of  germanium  makes  it 
unsuitable  for  accurate  transmission  or  transflection  measure¬ 
ments— without  corrections  for  optical  distortions.  An  ideal  trans¬ 
mission  substrate  has  an  index  matched  to  the  sample,  while  an 
ideal  transflection  substrate  is,  for  example,  a  strong  conductor. 
At  the  toluene-germanium  boundary  both  the  transmission  and 
reflection  coefficients  are  significant  and  both  are  relatively 
sensitive  to  the  sample  index.  As  a  result,  the  real  part  of  the  index 
is  strongly  coupled  into  the  measurement.  This  coupling  is 
particularly  noticeable  in  the  transflection  measurement  and 
results  in  apparently  negative  absorbance  measurements.  The 
transflection  simulations  of  Figure  8b, c  illustrate  how  the  pre¬ 
sented  framework  can  be  used  to  examine  spectral  distortions 
introduced  in  the  transflection  modality  and  suggests  how  explicit 
optical  modeling  may  be  useful  in  the  design  of  transflection 
substrates. 

Air-Gap-Induced  Distortions.  It  is  not  uncommon  in  the 
mounting  of  a  sample  on  the  substrate  to  introduce  a  small  air 
gap  between  the  two;  alternatively,  the  sample  itself  may  contain 
a  void.  In  Figure  8d,  spectral  distortions  caused  by  such  voids 
are  shown  for  a  variety  of  air  gaps.  The  sample  material  is  again 
toluene,  and  the  background  measurements  are  taken  without 
accounting  for  the  gap.  Significant  changes  in  peak  shape, 
amplitude,  and  position  can  again  be  seen  in  transflection.  The 
distortions  are  less  severe  in  transmission  although  a  significant 
nonzero  baseline  is  observed.  Findings  consistent  with  these 
simulations  have  been  reported  with  an  underlying  Matrigel  layer 
and  observed  to  depend  on  layer  thickness14  as  seen  here. 
However  the  effect  was  in  that  work  attributed  to  a  scattering  effect 
based  on  a  qualitative  analysis.  An  alternative  qualitative  analysis 
attributes  distortions  to  contributions  from  reflections  from  the 
top  surface  of  a  sample.15  It  was  also  reported  by  Romeo  and  Diem 
that  poorly  adhered  or  thin  samples  may  produce  a  dispersive 
line  shape,13  consistent  with  results  shown  in  Figure  8d.  The 
rigorous  model  developed  here  accounts  for  both  the  observed 
results  in  a  quantitative  manner,  as  well  as  acting  as  a  guide  to 
understand  potentially  confounding  effects  in  sample  preparation.  An 
understanding  of  this  effect  is  especially  relevant  to  cytological 
analyses  in  which  single  cells  are  analyzed  for  malignancy.  Sample 
preparation  becomes  critical  in  those  applications  and  has  been 
reported  to  be  a  major  challenge  in  developing  IR  microscopy  for 
cytology.47  The  effect  on  tissue  samples  can  be  expected  to  be  less 
drastic,  as  individual  cell  spectra  are  usually  less  important  within 
the  greater  tissue  structure,  and  both  the  spectral  and  spatial 
organization  of  the  cells  can  be  employed  for  effective  diagnoses.48 

Comparison  with  Bulk  (Macro)  Spectroscopy.  The  simula¬ 
tions  presented  above  have  shown  how  sample  structure  and  the 
real  (dispersive)  part  of  the  refractive  index  affect  the  recorded 


(47)  Romeo,  M.;  Mohlenhoff,  B.;  Diem,  M.  Vib.  Spectrosc.  2006,  42,  9-14. 

(48)  Pounder,  F.  N.;  Bhargava,  R.  Submitted  for  publication. 


Figure  9.  Magnitude  of  the  difference  between  the  data  predicted 
in  Figure  8d  (for  an  air  gap  of  2  /<m  thickness)  and  the  data  predicted 
using  a  comparable  single  ray  model. 

spectral  data.  These  effects  produce  apparent  deviations  from 
Beer’s  law  if  the  simple  model  of  eq  33  is  applied.  The  importance 
of  optical  effects  has  been  recognized  for  some  time 49-51 
particularly  in  reflection-based  modalities,  and  algorithms38,52,53 
have  been  developed  to  calculate  the  complex  refractive  index 
from  certain  types  of  data  measured  in  bulk  spectroscopy.  In 
systems  without  tight  focusing,  this  type  of  approach  has  been 
applied  to  correct  for  the  apparent  artifacts21,39,40,54-59  and  should 
be  used  where  possible.  In  addition  to  general  interference  and 
dispersion  effects  (as  observed  without  tight  focusing  in  bulk- 
sample  spectroscopy),  the  model  developed  in  this  work  takes 
into  account  optical  effects  produced  by  the  tightly  focused 
illumination  and  collection  of  light.  If  focusing  effects  are  negligible 
in  comparison  with  the  effects  already  modeled  in  bulk  spectros¬ 
copy,  it  can  be  expected  that  existing  correction  algorithms  will 
interpret  microspectroscopy  data  correctly. 

Figure  9  shows  the  difference  between  the  focused-model,  2  /<m- 
air-gap  data  of  Figure  8d  and  those  calculated  for  the  same  sample 
but  using  a  single  representative  ray  path  (i.e.,  a  model  without 
focusing).  For  the  transmission  system,  the  representative  ray  path 
is  taken  to  be  at  normal  incidence,  and  for  the  transflection  system, 
the  median  reflected  path  is  chosen.  The  single-ray  approach  does 
not  capture  effects  due  to  focused  path  length  difference  [as 
illustrated  in  Figure  8a]  and,  as  seen  in  Figure  9,  will  not  fully  capture 
the  behavior  of  the  tightly  focused  system.  For  this  example,  the 
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Figure  1 0.  Ray-based  and  fully  focused  predictions  for  (a)  transmis¬ 
sion  and  (b)  transfiection  modalities  and  the  experimental  bench¬ 
marking  sample. 

single-ray  simplification  produces  errors  in  the  transfiection  system 
in  particular.  It  is  noteworthy  that  contemporary  instruments  can 
produce  signals  with  low  enough  noise  to  observe  absorbance  values 
in  the  10“ 1  to  10“4  range.  Hence,  these  errors  are  significant,  and 
the  detailed  model  developed  here  should  be  used. 

Differences  between  the  focused  and  single-ray  models  arise 
from  the  angular  dependence  of  the  light-sample  interaction.  The 
difference  between  the  full  model  spectrum  and  that  predicted 
using  a  single  ray  model  is  shown  in  Figure  9.  It  is  seen  that  this 
angular  dependence  can  be  significant,  particularly  around  regions 
of  high  absorption.  For  the  simpler  samples  considered  in  Figure 
8b  (i.e.,  air-sample-substrate  systems  with  no  air  gap),  the 
angular  dependence  is  less  critical  and  gives  maximum  path 
length-normalized  absorbance  errors  of  0.0044  in  transmission  and 
0.011  in  transfiection,  as  compared  to  maximum  errors  of  0.0046 
and  0.11  in  transmission  and  transfiection,  respectively,  in  Figure 
9.  Conversely,  many-layer  sample-substrate  systems,  with  com¬ 
parable  transmission  and  reflection  coefficients  at  layer  bound¬ 
aries,  may  be  highly  sensitive  to  incidence  angle  and  hence  to 
focusing  effects.  In  Figure  10,  a  comparison  between  ray-based 
and  focused  models  for  the  experimental  benchmarking  system 
(see  Figure  7)  is  presented.  For  this  more  complicated  sample- 
substrate  system,  the  use  of  the  ray-based  model  introduces 
significant  errors.  Focusing  effects  therefore  play  a  very  significant 
role  when  modeling  the  benchmarking  sample  used  in  the 
experimental  validation  of  the  model. 

The  analyses  presented  here  should  be  used  as  a  guide  to 
estimate  the  precision  in  the  data.  The  first  implication  is  that 
the  choice  of  sampling  mode  and/or  substrate  greatly  influences 
the  magnitude  and  form  of  systematic  error  introduced  into  the 
measurement.  A  second  result  demonstrates  that  there  is  a 
dramatic  difference  in  the  precision  achievable  by  transmission 
mode  and  transfiection  mode  microspectroscopy.  The  distortion 
is  nonlinear  and  not  trivial  to  correct.  One  practical  implication  is 


that  the  noise  in  the  data  acquired  must  be  no  smaller  than  the 
observed  deviation  from  the  true  spectrum.  Any  further  reduction 
in  noise  would  make  the  analytical  conclusions  limited  by 
systematic  distortions  and  not  random  noise.  In  general,  the 
presented  theoretical  framework  should  be  considered  a  starting 
point  for  detailed  optical  modeling  in  specific  studies.  In  biomedi¬ 
cal  applications,  where  spectral  assignments  are  challenging  and 
spectral  changes  are  small,  detailed  modeling  can  be  expected  to 
be  important  in  understanding  biochemical  changes  accurately. 

CONCLUDING  REMARKS 

A  mathematical  model  for  mid-IR  microspectroscopy  has  been 
derived  by  solving  Maxwell’s  equations  in  layered  media  and  for 
focused  illumination  and  detection.  Predictions  given  by  this  model 
are  consistent  with  experimental  results  and  with  observations 
reported  in  the  literature.  It  is  seen  that  the  interplay  of  focusing, 
the  sample  geometry,  and  strong  dispersion  fully  accounts  for  the 
spectral  response  and  apparent  artifacts  for  simple  homogeneous 
systems.  Additional  spectral  effects  that  are  produced  by  scattering 
within  heterogeneous  materials  are  addressed  in  part  II  of  this  work. 

The  model  developed  here  can  be  applied  to  both  transfiection 
and  transfiection  collection  geometries.  While  transmission  spectra 
demonstrated  some  robustness  to  distortions,  transfiection  sys¬ 
tems  were  seen  to  be  particularly  sensitive  to  focusing,  dispersion, 
and  sample-structure  induced  distortions.  Ideally  the  distortions 
observed  may  be  corrected  by  mathematically  inverting  the 
developed  model,  in  order  to  estimate  optical  constants  of  the 
sample  directly.  However,  in  many  cases  of  interest,  the  sample 
structure  (i.e.,  the  materials  present  in  sample  layers  and  the  layer 
thicknesses)  may  not  be  known.  This  complicates  the  inversion 
process,  as  the  sample  geometry  must  be  coestimated  with  the 
optical  constants  of  the  material  of  interest. 

Spectral  distortions  due  to  sample  structure  (e.g.,  interference 
between  interfaces)  and  dispersion  have  previously  been  reported 
for  systems  that  do  not  employ  tight  focusing.  The  model 
presented  here  describes  tightly  focused  fields  throughout  the 
sample  and  also  predicts  focusing  dependent  distortions  that  may 
impact  the  measured  spectra  for  certain  sample  geometries.  In 
comparison  to  typical  experimental  noise  in  modem  IR  microspec¬ 
troscopy  systems,  the  effects  were  found  to  be  significant. 
Consequently,  the  model  described  provides  a  means  to  under¬ 
stand  distortions  that  may  limit  the  analytical  capability  of  IR 
microspectroscopy. 
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Fourier  transform  infrared  (FT-IR)  spectroscopic  imaging 
combines  the  specificity  of  optical  microscopy  with  the 
spectral  selectivity  of  vibrational  spectroscopy.  There  is 
increasing  recognition  that  the  recorded  data  may  be 
dependent  on  the  optical  configuration  and  sample  mor¬ 
phology  in  addition  to  its  local  material  spectral  response, 
but  a  quantitative  framework  for  predicting  such  depen¬ 
dence  is  lacking.  Here,  a  theory  is  developed  to  relate 
recorded  data  to  the  spectral  and  physical  properties  of 
heterogeneous  samples.  The  modeling  approach  com¬ 
bines  optical  theory  through  rigorous  coupled  wave  analy¬ 
sis  with  modeling  of  sampling  geometry  and  sample 
structure.  The  interplay  of  morphology  and  dispersion  are 
systematically  explored  using  increasingly  sophisticated 
samples  to  illustrate  the  dependence  of  the  detected 
optical  intensity  on  the  spatial  sample  structure.  Predic¬ 
tions  of  spectral  distortions  arising  from  the  sample 
structure  are  quantified,  and  experimental  validation  of 
the  developed  theory  is  performed  using  a  microfabricated 
standard  from  a  commercial  photoresist  polymer.  The 
developed  framework  forms  a  basis  for  understanding 
sample  induced  distortions  in  spectroscopic  IR  micros¬ 
copy  and  imaging. 

Fourier  transform  infrared  (FT-IR)  spectroscopic  imaging  is  a 
rapidly  emerging  technology  that  combines  the  spatial  specificity 
of  optical  microscopy  with  the  chemical  selectivity  of  vibrational 
spectroscopy.1-4  It  is  commonly  misconceived  that  FT-IR  imaging 
is  a  simple  extension  of  conventional  infrared  spectroscopy  using 
a  different  sampling  accessory,  namely  a  microscope.  From  the 
optics  perspective,  similarly,  it  is  tempting  to  conclude  that  FT-IR 
imaging  is  an  extension  of  optical  microscopy  with  discrimination 
of  IR  light  by  wavelength.  In  this  series  of  articles,  it  is  shown 
that  neither  characterization  is  accurate.  In  the  previous  article,* 1 2 3 4 5 
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optical  theory  for  IR  microscopy  was  developed  and  it  was 
demonstrated  that  the  combination  of  the  sample-substrate 
structure  and  optical  configuration  can  result  in  significant 
distortions  in  data  recorded  from  homogeneous  samples.  Briefly, 
optical  theory  was  applied  to  model  interrogation  of  a  sample  that 
was  assumed  to  consist  of  a  homogeneous  layer  in  a  sample- 
substrate  structure  with  no  transverse  variation.  The  sample  was 
characterized  by  upper  and  lower  boundaries  and  by  the  frequency 
dependent  complex  relative  permittivity  e(v),  or  equivalently,  by 
a  constant  complex  refractive  index. 

In  this  article,  the  analysis  is  extended  to  heterogeneous 
samples  that  vary  in  the  lateral  sample  plane.  The  sample  is 
characterized  by  upper  and  lower  boundaries  as  well  as  a 
transverse  structure  defined  by  permittivity,  s(x,  y,  v).  An  example 
of  this  type  of  structure  is  shown  in  Figure  1.  While  the  sample 
has  nontrivial  structure  in  the  imaging  plane,  it  is  assumed  to  be 
piecewise  constant  as  a  function  of  depth.  Such  a  model  is 
appropriate  for  thin  samples  as  are  usually  encountered  in  IR 
microspectroscopy.  This  structure  is  amenable  to  analysis  through 
coupled  wave  theory.  The  notation  used  here  is  consistent  with 
the  first  article5  and  is  also  listed  in  the  glossary  of  Table  SI  in 
the  Supporting  Information. 

In  the  earliest  studies,6 * *  it  was  noted  that  heterogeneous  sample 
structure  distorts  both  the  apparent  spectrum  and  the  apparent 
spatial  structure  in  FT-IR  imaging.  Other  authors  have  also 
attributed  spectral  distortions  to  heterogeneous  sample  structure.7-9 
While  experiments10  demonstrated  that  distortions  arose  from  a 
mismatch  of  refractive  index  between  domains  in  the  sample,  a 
complete  theoretical  model  to  predict  the  effects  of  heterogeneous 
samples  on  observed  spectra  and  spatial  structure  has  not  been 
presented.  The  absence  of  such  a  model  can  lead  to  misinterpreta¬ 
tion  of  spatial  structure  and/ or  spectral  changes  observed  at  the 
boundaries  of  domains.  The  full  analytical  capability  of  FT-IR 
imaging  can  only  be  realized  through  proper  modeling  of  the 
optical  physics  of  the  combined  sample-instrument  system.  These 
models  help,  first,  to  understand  the  true  spectral  and  structural 
content  of  the  data.  Second,  they  help  provide  measures  of  the 
systematic  error  due  to  distortions.  Studies  that  claim  chemical 
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Figure  1.  An  illustration  of  the  type  of  sample  and  substrate 
geometry  considered  in  this  article.  Here  the  sample  (a  slab  of  finite 
extent)  is  illuminated  through  a  substrate.  The  sample  layer  is  defined 
by  the  permittivity  r(x,  y,  v),  and  the  region  of  interest  is  of  width  A* 
in  the  x direction.  In  contrast  to  the  previous  article,  the  sample  may 
scatter  light  outside  of  the  illumination  angles. 

or  structural  changes  at  edges  of  domains  might  employ  the  model 
reported  here  to  verify  that  the  magnitude  of  those  changes  is 
indeed  larger  than  those  due  to  optical  effects  alone.  In  this  article, 
optical  theory  for  analysis  of  heterogeneous  structures  in  mid-IR 
imaging  is  developed.  The  variation  of  certain  parameters  in  the 
model  is  predicted  to  lead  to  specific  distortions.  Predictions  are 
compared  to  experimental  data. 


THEORETICAL  MODEL 

In  the  preceding  work,5  it  was  shown  that  each  planewave 
mode  of  the  electric  field  (indexed  by  the  propagation  directions 
sx  and  sy)  may  be  propagated  through  the  sample— substrate 
system  independently.  When  transverse  sample  structure  is 
introduced  this  is  no  longer  true.  Optical  effects  such  as 
scattering  and  refraction  induce  coupling  between  the  modes. 
These  effects  are  calculated  below  using  rigorous  coupled  wave 
analysis, 11-13  which  was  originally  developed  for  modeling 
diffraction  gratings.  While  there  are  alternative  methods  which 
could  be  used  to  solve  the  problem  at  hand,14-16  coupled  wave 
analysis  provides  a  clear  description  of  how  the  transverse 
structure  of  the  object  couples  planewave  modes.  The  coupled 
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wave  method  is  also  widely  used,  and  the  associated  numerical 
implementation  is  well  studied.17  In  the  following  presentation, 
rigorous  coupled  wave  analysis  is  briefly  described  and  applied 
to  the  mid-IR  imaging  problem  in  order  to  explain  artifacts,  for 
example,  from  edge  scattering. 

It  is  assumed  that  the  transverse  area  of  interest  in  the  sample 
is  some  finite  range  A*  x  A,  in  Cartesian  coordinates  x,  y.  The 
object  within  this  range  can  then  be  represented  in  the  Fourier 
series 


JVir-l  N„- 1 

e(x,y,  i>)  =  ^  ^  4>p,q(v)  exp[i(pUx  +  qWy)]  (1) 

p=-Nu  q=—Nw 

where  U  =  2 jr/A*  and  W  =  2n/ Ay.  The  Fourier  series  has  been 
truncated  to  2NV  terms  in  the  x  direction  and  by  2NW  terms  in 
the  y  direction.  Note  that  this  representation  repeats  the  object 
periodically  outside  the  region  of  interest.  However,  the 
problem  is  formulated  below  so  that  light  is  focused  into  the 
single  period  A*  x  A,  with  negligible  intensity  outside  this  area. 

The  reciprocal  of  the  permittivity  is  well-defined  and  will  be 
useful  in  the  analysis  below.  This  function  can  also  be  represented 
as  the  Fourier  series, 

Nu-l  Nw- 1 

[e(pc,y,v) ]-1=  ^  ^  xppq(y)  exp[i(pUx  +  qWy)] 

p=-Nv  q=—Nw 

(2) 

As  with  the  first  article,5  the  incident  field  is  decomposed  into  a 
collection  of  constituent  planewaves.  Each  individual  planewave 
component  is  infinite  in  extent  and  thus  impinges  on  the  periodic 
extension  of  the  sample  structure.  A  localized  response  is 
generated  by  summing  over  the  planewave  spectra  near  the  end 
of  the  calculation,  but  for  intermediate  steps,  it  is  useful  to  be 
able  to  appeal  to  the  formal  periodicity. 

Consider  an  incident  planewave  with  Cartesian  transverse 
spatial  frequency  components  d  and  a,  that  is,  a  field  proportional 
to  exp  [i(dx  +  ay)  ]  in  a  fixed  z  plane.  The  spatial  periodicity  of  the 
sample  implies  that  the  scattered  field  consists  only  of  planewave 
components  with  transverse  spatial  frequencies  that  are  shifted 
from  those  of  the  incident  field  by  integer  multiples  of  the 
constants,  U  and  W.  Explicitly, 

Up  =  pU  +  (5  (3) 

wq  =  qW  +  a  (4) 

That  is,  through  interacting  with  the  sample,  an  incident  planewave 
with  transverse  frequencies  d  and  a  must  give  rise  to  planewaves 
with  transverse  dependence  of  the  form  exp[ i(upX  +  ui^y)],  due 
to  the  translational  periodicity  of  the  problem.  At  p  =  q  =  0, 
the  undiffracted  component  is  obtained  and  all  other  values 
represent  diffracted  modes. 

For  reasons  similar  to  those  given  above,  the  field  in  any  fixed-2 
plane  of  the  sample  must  be  composed  of  fields  with  the  same 
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transverse  frequencies  given  in  eqs  3  and  4.  Therefore,  in  the 
sample  layer  (indexed  by  layer  /=  A),  between  z(A_1)  and  z(A), 
the  electric  field  vector  can  be  written  in  the  form 


E  w(x,y,z,v)  =  X  X 
P  Q 


-Xp,(z,v) 

Yp,q(z,v) 

Zp,q(z,v) 


exp  [iiUpX  +  w^)] 


(5) 


Note  that  while  the  Fourier  transform  of  the  object  function  was 
truncated  in  eq  1,  the  field  resulting  from  scattering  from  this 
approximation  to  the  object  need  not  be  similarly  band-limited. 
However,  it  is  necessary  in  the  numerical  calculation  of 
E(A)  (x,  y,  z,  v)  to  make  a  potentially  different  truncation  of  eq 
5.  This  truncation  is  made  such  that  the  diffracted-field  coefficients 
Xp.q(z,  v),  YPq(z,  v),  and  ZPq(z,  v )  have  decayed  to  a  negligible 
level  before  the  truncation  point. 

In  the  inhomogeneous  sample  layer,  the  magnetic  field  is 
nontrivially  related  to  the  electric  field  (c.f.,  the  relationship  in  a 
homogeneous  layer18).  Hence  it  will  be  convenient  to  describe 
the  magnetic  field  separately  as 


H  (A)(p,y,z,v)  = 


TMfev)  ■ 

/?xx 

Jp.q(Z,  V) 

1  p  q 

[Kp,q(z,v)\ 

exp  [i(upX  +  wjy)] 


(6) 


In  the  homogeneous  layers,  e.g.,  in  the  substrate  or  in  the 
homogeneous  sample  addressed  in  the  preceding  article,  each 
component  of  the  planewave  spectrum  of  the  field  can  be 
propagated  independently  and  the  results  can  be  summed  to  find 
the  field  at  any  given  plane.19  In  the  structured  sample  considered 
here,  the  relationship  between  fields  in  distinct  transverse  planes 
is  more  complicated  and  this  is  reflected  in  the  very  general 
dependence  of  eqs  5  and  6  on  z.  The  evolution  of  the  electric  and 
magnetic  fields  with  z  is  found  using  the  Maxwell— Faraday 
equation  and  Ampere’s  circuital  law.  With  the  use  of  the  time 
harmonic  form  and  the  fact  that  c  =  1/(eq m0)1/2>  these  can  be 
written 


V  x  E(r,  v)  =  i2jiv.  /  — H(r,  v) 


(7) 


V  x  H(r,  v)  =  —i2nve(r,  v) .  —  E(r,  v)  (8) 

fz  o 


Substituting  eqs  5  and  6  into  eq  7  and  equating  coefficients  for 
each  transverse  frequency  pair  ( up ,  wq)  results  in  the  equations 


d X„Az,v) 

- ^ - =  i2jivJpq{z,  v)  +  iUpZpq(z,  v)  (9) 

d  YPa(z,v) 

P'\  =  ~i27ivlpjz,  v)  +  iWqZpq(z,  v)  (10) 


(18)  Equation  2  in  ref  5. 

(19)  Equation  4  in  ref  5. 


RP,q(Z,  V)  =  2^  [upYp  q{z,  v)  -  WqXp  q (z,  v)  ]  (11) 

Substituting  eqs  1,  5,  and  6  into  eq  8  and  equating  transverse 
frequency  pairs  for  the  x  and  y  components  of  the  vector  equation 
gives  the  equations 


cl/,.  n{z,  v)  Y — ' 

^  ^4>P-Kq-q^)YKr(z,v)  + 

P"  Q" 

iupKpq(z,v)  (12) 

^  X  ^4p-p\q-Av)Xp.w(z,v)  +  iwqKpq(z,  v) 

P"  q" 

(13) 

Equating  transverse  frequency  pairs,  the  z  component  in  eq  8  can 
be  found  by  first  dividing  both  sides  of  the  equation  by  t(r,  v). 
The  expression  for  the  reciprocal  of  e(r,  v),  eq  2,  can  then  be 
used  to  give 


XXw^ivW-1)  - 

L  p"  q" 

»**»])  (14) 

The  results  seen  in  eqs  9-14  determine  how  the  electric  and 
magnetic  fields  propagate  through  the  sample  layer.  The  depen¬ 
dence  on  KP  q(z ,  v)  and  ZP  q{z ,  i>)  can  be  eliminated  by  substitut¬ 
ing  eq  14  into  eqs  9  and  10  and  eq  11  into  eqs  12  and  13.  The 
result  is  four  sets  of  coupled  first-order  differential  equations.  The 
{up,  wq)  frequency  pairs  retained  in  eqs  5  and  6  are  then  placed 
in  a  one-dimensional  ordering,  indexed  by  m.  Using  this  one¬ 
dimensional  ordering,  each  set  of  functions  can  be  arranged  as  a 
Nf  x  1  column  vector,  where  NF  is  the  number  of  terms 
retained.  These  vectors  can  then  be  concatenated  and  the 
system  of  differential  equations  written  in  the  form 


dX(z,  v) 
dz 

dY(z,  v) 
dz 

dl(z,  v) 
dz 

dJ  (z,  v) 
dz 


i2nvQ  (v) 


'X(z,  v) 
Y  (z,v) 
I  (z,  v) 
,J(z,  v) 


(15) 


where  <1>  (r)  is  a  4 NF  x  4 NF  matrix.  For  convenience,  the 
dependence  of  <1>  on  v  is  suppressed  for  the  remainder  of  this 
work. 

The  form20  of  $  guarantees  that  eigenvalues  come  in 
pairs  of  opposite  sign,  i.e.,  the  eigenvalues  of  <f>  can  be  de¬ 
noted  by  ±yi,  ±y2,  .  .  .  ,  ±y2jvf-  The  eigenvectors  of  <J>  are 
gi,  hi,  g2,  h2,  . .  .  ,  g 2nf,  h2,vf  ,  where  the  vector  g;  is  associated 
with  the  eigenvalue  yp  and  the  vector  h,  is  associated  with 
—yj.  The  eigenvalue  y;-  is  taken  to  lie  in  the  upper  half  of 
the  complex  plane,  that  is  y;-  is  chosen  such  that  its  imaginary 


(20)  Equation  57  in  ref  17. 
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part  is  positive.  Note  that  for  purely  real  eigenvalues,  +y, 
will  be  chosen  to  be  positive. 

Finding  the  eigenvalues  and  eigenvectors  of  <f>  allows  the 
matrix  to  be  decomposed  in  the  form 

<F  =  GTG1  (16) 

where  T  contains  the  eigenvalues  on  the  diagonal  and  is  zero 
elsewhere,  and  the  vectors  g,  and  h,  are  organized  to  form  the 
corresponding  columns  of  G. 

An  uncoupled  set  of  ANF  first  order  differential  equations  can 
be  written  as  a  single  matrix  equation  dV(z)/dz  =  i2jcvTV(z), 
where  V(z)  is  a  4 NF  x  1  vector.  Such  a  set  of  equations  is  easily 
solved  (each  equation  can  be  solved  individually)  and  the  result 
used  to  construct  a  solution  of  eq  15.  That  solution  can  be 
constructed  as  GV (z)  so  that 


2  Nf 

Xm(z,v)  =  X  {Pj&j.m  exp[t'2jrPyy(z  -  z(A_1))]  + 
i=  i 

Pjhjm  exp[-f2jr ry/z  -  z(A))]}  (17) 

2Nf 

Ym(z,v)  =  X {PjS,,m+NF exp \i2jtvyj(z  -  z(A_1))]  + 

7=1 

Pikm+N, expl-ttjivy/z  -  z(A))]}  (18) 

2Nf 

IJZ’V)  =  X  i,™+2NF  exp  [?2jrvy;(z  -  z(A_1))]  + 

7=1 

PAm+2NF exp[-f27ri>y;(z  -  z(A))]}  (19) 

2NF 

=  ^{Pjgj,m+3NF  exp[l2mq/;-(z  “  Z(A~U)]  + 

7=1 

PAm+wF exp[-f2^i>y;(z  -  z(A))]}  (20) 

where  git  m  is  the  mth  element  of  the  vector  g,,  hjt  m  is  the  wth 
element  of  the  vector  h,-,  and  (>,  and  (f  are,  as  yet  undetermined, 
coefficients.  The  field  in  the  sample  layer  is  determined  by  eqs 
17—20,  with  the  z-polarized  components  given  by  eqs  11  and  14. 
The  sample  structure  determines  the  values  for  y,  gjm,  and  hjM 
through  the  eigenvalue  decomposition  of  d>.  The  ANF  remaining 
coefficients  if2NFf}j  coefficients  and  2NFjij  coefficients)  are  set 
by  boundary  conditions,  i.e.,  the  illuminating  field  determines 
these  values. 

A  representation  of  the  field  in  the  homogeneous  layers  (e.g., 
the  air  surrounding  the  substrate  and  sample  and  the  substrate) 
has  been  described  elsewhere19  and  can  be  rewritten  as 


Nf 

E ^\x,y,z,v)  =  v  X  {BW(«,  v)  exp[i2jrvs^(m,v)(z  —  z(A1))] 

m= 1 

+  6(/)(w,  v)  exp[-i2jtvs^(m,v)(z  —  z(/))]}  x 

exp\i(uHmf  +  wq^m)y)]  (21) 

The  modes  of  the  field  in  the  homogeneous  layers  are  here 
indexed  by  m,  whereas  in  the  previous  article5  they  were  indexed 
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by  the  transverse  propagation  quantities  sx  and  sy.  Writing  the 
modes  in  the  manner  above  allows  the  field  in  the  homoge¬ 
neous  layers  to  be  matched  to  the  field  in  the  sample.  The 
relationship  between  m  and  sx  and  Sj,  is 


sx(m,  v ) 


up{m)  _  p(m)  , 
2jzv  Axv 


sy(m,  v ) 


wg(m)  _  q(m)  , 

2jn>  Ayv 


(22) 

(23) 


where  p(m )  and  q(m )  describe  the  one-dimensional  ordering  of 
ip,  q )  onto  m.  These  equations  describe  the  relationship  between 
the  periodicity  of  the  object  (A*  and  A,)  and  the  transverse 
propagation  direction.  The  axial  propagation  factor  ( m ,  v) 

is  calculated  from  a  dispersion  relation.21  In  contrast  to  the 
homogeneous  layers,  at  a  given  transverse  spatial  frequency, 
the  field  in  the  transversly  inhomogeneous  sample  consists  of 
contributions  of  different  axial  propagation  constants  [compare 
( m ,  v)  in  eq  21  to  y;-  in  eqs  5,  6,  and  17-20]. 

The  field  in  a  homogeneous  layer  is  determined  by  the  vectors 
B(/)  (m,  v )  and  B(/>  (m,  v).  Just  as  4 NF  coefficients  if},  and  f}f) 
determine  the  field  in  the  sample  layer,  transversality  condi¬ 
tions22  reduce  the  6NF  elements  of  B(/>  im,  v )  and  6(/)  ( m ,  v) 
to  4 NF  independent  parameters.  Thus,  for  a  sample  with  L 
layers,  4 LNF  parameters  fully  describe  the  field.  As  with 
the  case  considered  in  the  preceding  article,  continuity  of  the 
transverse  electric  and  magnetic  fields  can  be  enforced  at  the 
boundaries  for  each  transverse  spatial  frequency  to  give  4(L 
—  1  )NF  independent  constraints.  By  construction,  illumination 
comes  from  only  one  side  of  the  sample  so  the  condition 

6(i)(»7,i>)  =  0  (24) 

eliminates  another  2 NF  unknowns.  The  remaining  2 NF  param¬ 
eters  are  determined  by  setting  the  illumination  vectors 
B(1)  im,  v) ,  as  described  in  the  first  article.5  The  detection  of 
light  scattered  from  the  sample  is  also  the  same  as  in  the 
previous  article. 

For  the  homogeneous  layers  considered  in  the  first  article, 
the  continuous  plane  wave  spectra19  could  be  evaluated  numeri¬ 
cally  by  discretizing  the  transverse  propagation  cosines  sx  and  sy 
on  any  grid.  In  the  formulation  described  here,  a  natural 
discrete  grid  is  set  by  the  periodic  extension  of  the  object  on 
length  scales  Ax  and  Ay  (note  that  these  may  be  chosen  such 
that  the  focused  light  is  localized  within  a  single  period) .  This 
grid  may  be  more  coarse  than  desired,  particularly  for  small 
values  of  v.  However,  the  discretization  of  the  incident  field  in 
the  planewave  basis,  that  is  the  discretization  of  ( sx ,  sy)  in  eqs 
22  and  23,  may  be  performed  for  multiple  values  of  (<5,  a).  The 
resulting  fields  for  each  value  of  (<5,  o)  may  be  summed,  giving  a 
discretization  of  (s„  s^,)  on  an  arbitrarily  fine  grid,  that  is  at  least 
as  fine  and  commensurate  with  the  descretization  dictated  by 
Ax  and  Ar 


SIMULATION  AND  PREDICTION 

Numerical  simulations  are  presented  here  to  demonstrate  how 
diffraction  and  scattering  effects  in  heterogeneous  samples  are 


(21)  Equation  5  in  ref  5 

(22)  Equations  6  and  7  in  ref  5. 


coupled  to  the  sampling  geometry,  sample  morphology,  and 
spectral  profile  of  the  sample  such  that  the  bulk  or  so-called  “pure 
phase"  spectrum  is  changed.  In  the  preceding  article,  it  was  seen 
that  transmission  microspectroscopy  is  less  sensitive  to  optical 
distortions  than  transflection  microspectroscopy  when  a  homo¬ 
geneous  layered  sample  is  considered.  Further,  an  overwhelming 
majority  of  studies  using  IR  imaging  are  conducted  in  the 
transmission  mode.23  For  these  reasons,  the  transmission  mode 
is  considered  exclusively  in  the  following  examples.  The  extension 
to  transflection  is  straightforward. 

Measurements  from  two  samples  are  simulated  to  demonstrate 
the  potential  distortions  and  estimate  their  magnitude  in  a  first 
principles  manner.  In  the  first  example,  an  object  whose  response 
is  constant  across  all  wavelengths  is  considered.  Investigation  of 
focused  fields  in  the  sample  and  at  the  detector  illustrates  how 
the  spatial  structure  of  the  sample  affects  measurements,  inde¬ 
pendent  of  the  influence  of  spectral  changes.  In  the  second 
example,  full  spectral  data  are  simulated  for  a  hypothetical  sample 
of  spatially  structured  toluene,  illustrating  the  increased  complexity 
when  spectral  variations  are  added  to  the  sample  structure.  Effects 
resulting  from  the  spatial  structure  of  the  sample  can  be  seen, 
and  the  associated  influence  on  recorded  spectra  are  investigated. 
In  both  examples,  the  effect  of  an  edge  on  the  microspectroscopy 
data  is  further  investigated.  While  sensitivity  to  only  the  imaginary 
(absorptive)  part  of  the  refractive  index  is  desired,  the  thickness 
of  the  sample  and  the  real  part  of  the  refractive  index  are  both 
seen  to  affect  the  data  through  scattering  and  diffraction.  These 
effects  result  in  changes  in  the  observed  spectral  features, 
including  changes  in  the  absorption  band  profiles  and  peaks  and 
also  changes  in  the  ratios  between  absorption  peaks,  which  are 
all  quantified. 

Frequency-Invariant  Sample.  In  this  first  example,  the 
sample  material  considered  has  no  variation  in  optical  response 
as  a  function  of  wavelength.  By  investigation  of  the  interaction  of 
this  sample  with  focused  light  of  differing  wavelengths,  some  basic 
behaviors  of  the  microspectroscopy  system  can  be  identified.  The 
sample  considered  is  a  rectangular  slab  of  absorbing  material  with 
index  n  =  1.4  +  0.071  mounted  on  a  substrate  of  index  1.45  (i.e., 
the  geometry  shown  in  Figure  1).  The  slab  is  100  /<m  wide  in  the 
x  direction  and  of  infinite  extent  in  the  y  direction,  and  various 
thicknesses  b  in  the  z  direction  are  considered.  The  area  of  interest 
is  taken  to  be  Ax  =  200  /<m  wide  in  the  x  direction  and  infinite 
in  the  y  direction.  The  sample  is  illuminated  through  the 
substrate  with  a  y-polarized  line-focus.  A  line-focus  is  con¬ 
structed  by  considering  only  the  sy  =  0  line  of  the  aplanatic 
Cassegrain  angular  spectrum.24  A  Cassegrain  with  numerical 
aperture  of  0.5  and  a  central  obscuration  aperture  of  0.1  is 
considered.  In  representing  both  the  object  and  the  field,  200 
Fourier  series  coefficients  were  retained,  i.e.,  Nu  =  NF=  200. 
This  level  of  detail  gives  sharp  edges  in  the  representation  of 
the  sample,  while  increasing  the  number  of  Fourier  terms  did 
not  significantly  change  the  simulation  results,  indicating  that 
200  coefficients  are  sufficient  to  represent  the  field.  The  offsets 
(0,  a)  were  dithered  so  that  there  were  at  least  50  sample  points 
within  the  numerical  aperture  of  the  Cassegrain  for  all  values 


(23)  Koenig,  J.  L.;  Wang,  S.-Q.;  Bhargava,  R.  Anal.  Chem.  2001,  73,  360A- 
369A. 

(24)  Figure  4  in  ref  5. 


of  v.  The  angular  spectrum  from  this  discretization  level  leads 
to  a  smooth  and  reasonable  focused  field. 

The  line-focus  is  centered  on  the  absorbing  slab  in  Figure  2. 
It  should  be  noted  that  refraction  in  the  substrate  has  the  effect 
of  shifting  the  nominal  focal  point.25  Hence,  the  sample  and 
substrate  have  been  moved  in  the  axial  direction  here  so  that 
focusing  is  achieved  in  the  sample  plane.  This  wavelength- 
dependent  (chromatic)  shift  of  the  focus  has  been  noted  to  be  a 
significant  problem  for  dispersive  substrates  25,26  Here  it  is  noted 
that  the  substrate  also  introduces  aberration,25'27  as  can  be  seen 
by  comparing  the  fields  of  Figure  2  to  fields  without  a  substrate 
(Figure  SI  in  the  Supporting  Information).  When  the  line-focus 
is  positioned  between  the  absorbing  slabs,  the  results  of  Figure 
3  are  obtained,  while  focusing  onto  the  edge  of  a  slab  gives  the 
fields  shown  in  Figure  4. 

Several  comments  apply  to  all  three  line-focusing  cases.  Since 
the  sample  and  illumination  have  no  spatial  variation  with  y  and 
the  illuminating  light  isy-polarized,  the  field  in  the  sample  is  also 
strictly  jy-polarized.  Thus  the  plots  shown  are  a  complete  repre¬ 
sentation  of  the  field.  The  theory  does  encompass  more  general 
cases,  e.g.,  ^-polarized  illumination  or  two-dimensionally  focused 
fields,  but  the  resulting  vector  fields  are  more  challenging  to 
display.  The  magnitude  of  the  angular  spectrum  /J®  ( sx ,  v )  is 
shown  in  subplots  j— 1.  These  spectra  can  be  interpreted  as 
representations  of  the  field  strength  as  a  function  of  direction 
of  propagation.  The  fine  oscillations  observed  in  many  of  these 
functions  can  be  attributed  to  interference  between  unscattered 
light  and  contributions  scattered  from  edges  of  the  slab.  Any 
components  of  the  angular  spectrum  that  lie  outside  the 
collection  angle  of  the  detection  Cassegrain  are  not  collected 
upon  detection.  This  range  is  marked  by  the  empty  instrument 
response  (i.e.,  the  instrument  response  with  no  sample  or 
substrate),  in  this  case  0.1  <  |sx|  <  0.5.  Any  light  diffracted 
outside  the  collection  range  leads  to  an  apparent  absorption, 
as  this  light  is  not  detected.  It  should  also  be  noted  that  any 
components  at  |s*|  >  1  correspond  to  waves  that  are  evanescent 
in  free  space  and  do  not  propagate  to  the  detector.  The  intensity 
of  light  on  the  detector  plane  can  be  calculated  from  the 
emerging  angular  spectra,  as  described  in  the  previous  article. 

For  illumination  focused  into  the  center  of  the  slab,  fields 
within  the  sample  and  the  transmitted  angular  spectra  are  shown 
in  Figure  2.  The  penetration  of  the  field  through  the  sample  is  as 
expected,  thicker  samples  produce  more  attenuation  and  longer 
wavelengths  (i.e.,  lower  values  of  v)  are  more  weakly  absorbed. 
Standing  wave  effects  due  to  reflection  off  the  top  of  the  sample 
are  also  clearly  visible.  For  the  thin  sample  (6  =  2  /<m)  it  can  be 
seen  that  there  is  minimal  loss  of  intensity  due  to  diffraction  out 
of  the  collection  optics,  while  for  thicker  samples  more  light 
escapes  the  collection  cone.  It  should  be  noted  that  recorded 
spectra  in  microspectroscopy  are  usually  of  lower  signal-to-noise 
ratio  than  the  bulk  recording  case.  Hence,  absorbance  of  samples 
is  sought  to  be  maximized  by  adjusting  the  sample  thickness  such 
that  the  absorbance  is  maximized  in  the  linear  regime  of  Beer’s 
law.  The  typical  thickness  for  most  samples  is  5—10  fim  and 
feature  sizes  in  many  composites  and  biomedical  samples  are  of 

(25)  Carr,  G.  L.  Rev.  Sci.  Instrum.  2001,  72,  1613-1619. 

(26)  Wetzel,  D.  L.  Vib.  Spectrosc.  20  02,  29,  291-297. 

(27)  Torok,  P.;  Varga,  P.;  Laczik,  Z.;  Booker,  G.  R./.  Opt.  Soc.  Am.  A  1995, 12, 
325-332. 


Analytical  Chemistry,  Vol.  82,  No.  9,  May  1,  2010  3491 


-60  0  50  -50  0  50  -50  0  50 


Figure  2.  Responses  for  a  line-focused  y-polarized  field  incident  on  the  center  (x  =  0)  of  an  absorbing  slab.  The  slab  has  a  complex  index  1 .4 
+  0.07/  and  is  mounted  on  a  substrate  (the  upper  region  of  the  plots)  of  index  1 .45  and  thickness  2  mm.  The  field  is  focused  to  the  z  =  0  plane 
in  free  space.  Focusing  through  the  substrate  has  the  effect  of  moving  this  focus  by  about  640  /<m,  as  shown,  and  also  introducing  aberration 
(cf.,  Figure  SI  in  the  Supporting  Information,  which  considers  the  same  scenario  but  with  the  sample  suspended  in  free  space).  Three  sample 
thicknesses  are  considered,  2  /<m  in  the  left  column,  7  /;m  in  the  center  column,  and  15  /<m  in  the  right  column,  that  span  the  usual  range  in 
transmission  measurements.  The  y-polarized  field  (the  only  nonzero  field  direction)  in  the  sample  is  shown  in  parts  a-i.  The  substrate  boundary 
is  marked  with  a  dashed  line  and  the  slab  boundaries  with  solid  lines.  Wavelengths  of  (a-c)  3/<m  (v  =  3333  crrr1),  (d-f)  6/<m  (v  =  1667  crrr1), 
and  (g-i)  14  ^m  ( v  =  714  crrr')  are  shown.  The  magnitude  of  the  angular  spectrum  after  the  sample,  6<y3)(sx,  sy,  v)  is  shown  in  parts  j— I. 


a  similar  order  of  magnitude.  The  unfortunate  coincidence  of  order 
of  magnitude  for  wavelengths,  sample  features,  and  optimal  path 
length  has  an  impact  on  the  recorded  data  for  most  cases.  As 
this  simulation  demonstrates,  a  trade-off  between  random  error 
and  systematic  distortion  due  to  optical  effects  may  be  avoided  in 
some  cases  by  using  thinner  samples. 

In  Figure  3  the  same  system  is  considered  but  with  the 
illuminating  light  focused  between  two  slabs.  There  is  little  light 
incident  on  the  absorbing  material  and,  apart  from  a  reflection  at 
the  substrate  boundary,  the  focused  illumination  passes  through 


the  system  largely  unperturbed.  However,  for  the  thicker  samples 
some  scattering  effects  can  be  seen  in  the  resulting  angular 
spectra.  This  illustrates  how  the  optical  effects  produced  by  an 
edge  may  have  a  wider  region  of  influence  for  thicker  samples. 
The  implication  for  a  heterogeneous  material  is  that  the  influence 
of  domains  could  extend  well  beyond  their  obvious  morphologic 
boundaries  and  proximal  regions  in  a  manner  that  is  coupled  to 
the  thickness  of  the  sample.  While  dual  aperturing  is  used  in  point 
microspectroscopy  to  alleviate  these  effects  to  some  degree,  they 
will  be  readily  apparent  in  full-field  of  view  imaging. 
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Figure  3.  Responses  for  a  line-focused  y-polarized  field  incident  between  two  absorbing  slabs  (x  =  100  /<m)  separated  by  a  distance  several¬ 
fold  the  wavelength.  All  other  plots  and  parameters  are  the  same  as  in  Figure  2.  A  similar  scenario,  but  with  the  sample  suspended  in  free 
space,  is  illustrated  in  Figure  S2  of  the  Supporting  Information. 


The  illuminating  light  is  focused  onto  the  edge  of  the  sample 
in  Figure  4.  In  this  case,  as  expected,  significant  changes  to  the 
focused  field  can  be  observed.  Some  of  the  light  is  refracted  into 
the  absorbing  slab  and  bent  out  of  the  collection  cone  (this  can 
be  seen  in  parts  j— 1  particularly  clearly).  The  resulting  sample- 
induced  effects  can  be  seen  to  trend  progressively  more  prominent 
with  increasing  sample  thickness.  The  net  effect  of  an  edge  is  to 
redistribute  spatially  the  total  intensity  that  would  otherwise  be 
incident  on  the  detector.  If  the  distribution  is  outside  of  the 
collection  cone,  the  total  intensity  reaching  the  detector  is 
decreased  and  consequently  the  apparent  absorption  is  increased. 
This  apparent  increase  in  absorption  is  only  due  to  optical  effects 
however  and  depends  on  the  sample  morphology.  For  nonab¬ 
sorbing  spectral  regions,  the  resulting  imaging  contrast  is  strong 
at  the  edges  of  domains  and  is  akin  to  that  observed  in  optical 


microscopy.  The  contrast  between  domains  is  dictated  by  their 
respective  refractive  indices.  While  the  obvious  implication  is  that 
an  IR  microspectrometer  may  be  used  in  the  manner  of  an  optical 
microscope  with  properties  in  the  mid-IR  region,  such  a  use  is 
not  very  practical.  The  primary  motivation  for  working  in  the  mid- 
IR  region  is  to  obtain  chemical  contrast  using  absorbance  of 
specific  chemical  species  in  spatial  domains.  Hence,  the  more 
important  implication  is  that  scattering  from  nonabsorbing  regions 
of  one  domain  can  influence  the  data  recorded  in  an  absorbing 
spectral  region  for  another  domain.  In  this  manner,  optical  effects 
complicate  data  interpretation  and  make  measurements  of  the 
spectrum  dependent  on  sample  structure. 

Animations  showing  the  interactions  of  the  line-focus  with  the 
sample  are  included  in  the  Supporting  Information.  There  is  an 
animation  for  each  combination  of  wavenumber  and  sample 
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Figure  4.  Responses  for  a  line-focused  y-polarized  field  incident  on  the  edge  (x=  50  /;m)  of  an  absorbing  slab.  All  other  plots  and  parameters 
are  the  same  as  in  Figure  2.  A  similar  scenario,  but  with  the  sample  suspended  in  free  space,  is  illustrated  in  Figure  S3  of  the  Supporting 
Information. 


thickness  seen  in  Figure  2,  and  a  second  animation  for  each 
combination  but  with  the  sample  suspended  in  free  space  rather 
than  on  a  substrate. 

When  the  diffracted  components  are  collected,  the  distribution 
of  light  intensity  in  the  detector  plane  is  also  affected,  meaning 
that  contributions  from  the  edge  effects  can  produce  artifacts  in 
pixels  besides  the  ones  associated  with  the  edge  position.  To 
understand  the  practical  effects  of  spatial  redistribution,  a  fully 
two-dimensional  focusing  solution  is  needed.  Hence,  a  full  focusing 
aperture,  rather  than  a  line-focus,  is  considered  for  the  remainder 
of  this  article.  In  all  cases,  circular  apertures  (as  shown  in  the 
previous  article24)  are  represented  on  a  discrete  Cartesian  grid, 
as  consistent  with  the  analysis  presented  in  the  previous  section. 
The  effects  of  light  redistribution  are  illustrated  in  Figure  5. 


The  object  from  Figure  2  is  considered  in  Figure  5  but 
represented  with  Nu  =  40  coefficients.  The  angular  spectrum 
of  illumination  is  discretized  so  that  for  any  wavenumber  v  the 
sx  diameter  across  the  aperture  is  at  least  20  pixels  and  the  sy 
diameter  is  20  pixels.  The  field  emerging  from  the  sample  is 
represented  using  an  angular  spectrum  discretized  with  the 
same  pixel  spacing  and  with  20  pixels  in  the  sy  dimension  and 
at  least  60  pixels  in  the  sx  dimension.  The  discretization 
described  here  is  more  coarse  than  that  used  in  the  previous 
calculations  of  the  fields  in  the  sample.  This  is  because  the 
predicted  detection  data  are  less  sensitive  to  fine  features  of 
the  field  (e.g.,  evanescent  waves)  so  that  the  desired  prediction 
ceases  to  change  with  the  discretization  at  a  more  coarse  level. 
The  outer  and  inner  Cassegrain  numerical  apertures  are  again 
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Figure  5.  Absorbance  profiles  of  the  7  /mi  slab  of  Figure  2,  at 
different  wavenumbers  and  for  both  (a)  imaging  and  (b)  point  mapping 
modalities. 

0.5  and  0.1,  respectively,  which  means  that  the  discrete 
representation  of  the  scattered  fields  extends  well  into  the 
evanescent  region. 

Two  modalities  were  simulated.  First,  the  focus  of  an  unpo¬ 
larized  illuminating  field  was  translated  in  small  increments  and 
the  emerging  angular  spectrum  calculated.  By  calculation  of  the 
total  power  throughput,28  a  point  mapping  system  was  simulated, 
where  a  large  area  detector  was  used  and  the  Cassegrain  edges 
set  the  limiting  apertures  in  the  optical  path.  That  is,  the  sample 
was  illuminated  at  a  single  spot  and  the  transmitted  light  captured 
using  a  single  IR  detector.  Second,  widefield  illumination  with 
array  detection  was  simulated.  The  transmitted  angular  spectrum 
can  be  used  to  calculate  the  intensity  on  a  detector  plane,29  and 
for  each  focal  position  these  intensities  can  be  summed.  The  fill 
factor  of  the  detector  is  not  explicitly  considered  here  but  the 
consequences  of  a  nonunity  fill  factor  can  be  included  in  the  model 
and  are  not  expected  to  produce  significant  qualtitative  changes. 
These  two  approaches,  point  mapping  and  imaging,  are  both 
employed  in  contemporary  microspectroscopy,  and  simulations 
in  Figure  5  for  both  modalities  demonstrate  similar  results. 
However,  it  is  instructive  to  notice  how  the  measured  profile  of 
the  slab  depends  on  wavenumber.  For  example,  both  the  gradient 
of  the  absorbance  at  the  slab  edge  and  the  overshoot  at  the  edge 
vary  with  wavenumber.  While  the  wavenumber  dependence  of 
the  achievable  spatial  resolution  is  known,30  spectral  measure¬ 
ments  also  change  with  wavelength  due  to  optical  effects  (such 
as  diffraction  and  refraction)  and,  additionally,  with  sample 
structure  (e.g.,  thickness).  A  description  of  spectral  distortions 
and  their  effect  on  spatial  specificity  (and,  in  turn,  the  resolution 
attainable)  is  lacking.  The  model  sample  of  constant  k(y)  consid¬ 
ered  here  exhibits  differing  profiles  due  to  wavelength  dependent 
phenomena,  emphasizing  this  relationship  between  recorded 
spectra  and  the  apparent  morphology  of  the  sample. 

Frequency-Variant  Samples.  To  see  how  optical  phenomena 
influence  a  measured  spectrum,  the  simulation  parameters  de- 


(28)  Equation  32  in  ref  5. 

(29)  Equations  30  and  31  in  ref  5. 

(30)  Lasch,  P.;  Naumann,  D.  Biochim.  Biophys.  Acta  2006,  1758,  814-829. 


scribed  for  Figure  5  were  modified  by  replacing  the  constant  index 
of  the  slab  with  the  complex  refractive  index  of  toluene31  and  by 
replacing  the  constant  index  of  the  substrate  by  the  index  of 
barium  fluoride.32  A  background  measurement  was  calculated  by 
applying  standard  transmission  coefficients  to  model  the  transmis¬ 
sion  of  light  through  the  air-to-barium-fluoride  boundary  at  the 
first  substrate  surface  and  the  barium-fluoride-to-air  boundary  at 
the  second  substrate  surface.  In  the  presence  of  the  absorbing 
toluene  slab,  both  point  mapping  and  imaging  profiles  were 
calculated  using  the  methods  described  above. 

Spectra  from  the  imaging  modality  are  shown  in  Figure  6.  In 
these  calculations  it  was  assumed  that  the  pixel  size  was  5  [im  at 
the  sample  plane.  Spectra  are  plotted  for  the  center  of  the  slab 
and  for  measurements  in  the  vicinity  of  the  edge.  It  can  be  seen 
that  light  scattered  outside  of  the  collection  cone  produces  a 
nonzero  baseline  in  the  measured  spectra,  as  is  commonly 
observed.  A  smooth  baseline  function  is  often  fitted  to  these 
spectra  and  subtracted  out  before  spectral  metrics  are  calculated. 
Here,  local  linear  baselines  are  fitted  to  the  spectra,  as  is  common 
practice  in  spectral  preprocessing,  and  peak  position  and  height 
metrics  calculated  (as  illustrated  in  Figure  6) .  The  resulting  peak 
positions  are  given  in  Table  1,  and  the  resulting  normalized  peak 
heights  are  given  in  Table  2.  In  both  cases  an  ideal  value  has 
been  calculated  by  determining  the  true  absorbance  profile  from 
the  imaginary  refractive  index.33 

It  can  be  seen  that  the  observed  spectral  metrics  depend  on 
the  position  at  which  the  spectra  are  measured.  Optical  effects 
distort  the  spectra  by  coupling  the  real  part  of  the  refractive  index 
and  the  sample  structure  into  the  data.  While  baselining  has 
removed  some  of  the  gross  optical  effects,  the  metrics  are  not 
independent  of  morphology.  It  should  be  noted  that  correction 
algorithms  other  than  baseline  subtraction  have  been  proposed, 
e.g.,  taking  derivativesofthespectraormore  advanced  procedures.34,35 
However,  these  procedures  are  typically  ad  hoc  or  do  not  fully 
account  for  physical  phenomena  such  as  the  coupling  of  the 
dispersive  line-shape  (the  real  index)  into  the  observed  spectra 
and  the  influence  of  the  sample  morphology  on  the  collected  data. 
Hence  they  cannot  capture  the  physics  of  the  true  distortions  and 
may  provide  unjustified  confidence  compared  to  uncorrected  data. 

The  point  mapping  modality  was  also  simulated,  and  the 
measured  spectra  are  shown  in  Figure  7.  While  there  are 
differences,  the  gross  behavior  can  be  seen  to  be  similar  to  that 
observed  in  the  imaging  modality.  In  this  example,  the  observed 
peak  positions  (Table  3)  are  the  same  as  for  the  mapping  case, 
while  the  peak  ratio  (Table  4)  metrics  differ  but  exhibit  a  similar 
amount  of  variability  as  the  imaging  case.  The  baseline  charac¬ 
teristics  differ  between  the  imaging  and  mapping  modalities.  This 
is  to  be  expected  as  scattering  distorts  the  point  spread  function 
of  the  light  to  spatially  redistribute  light  intensity  incident  on  the 
detector — in  imaging  mode  this  means  that  light  scattered  from 
an  edge  can  effect  neighboring  pixels,  while  for  mapping  this  type 
of  crosstalk  does  not  occur. 


(31)  Figure  6  in  ref  5. 

(32)  Malitson,  I.  H ./.  Opt.  Soc.  Am.  1964,  54,  628-632. 

(33)  Equation  35  in  ref  5. 

(34)  Kohler,  A.;  Kirschner,  C.;  Oust,  A.;  Martens,  H.  Appl.  Spectrosc.  2005,  59, 
707-716. 

(35)  Thennadil,  S.  N.;  Martens,  H.;  Kohler,  A  Appl  Spectrosc.  2005,  60,  315- 
321. 
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Figure  6.  Imaging  spectra  from  a  7  fi m  thick  toluene  slab  on  2  mm  of  barium  fluoride.  The  absorbance  is  normalized  by  the  slab  thickness. 
Spectra  are  shown  from  the  center  of  the  slab  (x0  =  0)  and  in  the  vicinity  of  the  edge  (x  =  50  /rm).  The  full  spectra  (a),  and  details  for  x  =  0 
(b-d),  x  =  45  fim  (e-g),  x  =  50  fim  (h-j),  and  x  =  55  /rm  (k-m)  are  shown.  A  baseline  is  illustrated  by  a  dashed  line  in  the  detail  plots,  and 
peak  heights  and  positions  are  calculated  as  illustrated  by  the  solid  vertical  lines.  The  calculated  metrics  are  given  in  Tables  1  and  2. 


Table  1.  Peak  Positions  for  the  Imaging  Data  to  the 
Nearest  0.5  cm  1 


peak  1 

peak  2 

peak  3 

peak  4 

peak  5 

peak  6 

ideal 

3027.0 

2920.0 

1495.5 

1460.5 

1030.0 

1081.5 

*o  =  0 

3027.0 

2920.0 

1495.5 

1460.0 

1030.0 

1081.5 

x0  =  45  fi  m 

3027.0 

2919.5 

1495.5 

1460.0 

1030.0 

1081.0 

x0  =  50  fim 

3026.5 

2919.0 

1495.5 

1459.0 

1030.0 

1081.0 

x0  =  55  pim 

3027.0 

2919.0 

1495.0 

1458.0 

1029.5 

1080.5 

Table  2.  Normalized  Peak  Heights  for  the  Imaging  Data 


peak  1 

peak  2 

peak  3 

peak  4 

peak  5 

peak  6 

ideal 

1.00 

0.514 

1.79 

0.436 

0.444 

0.350 

*o  =  0 

1.00 

0.505 

1.76 

0.388 

0.416 

0.327 

xQ  =  45  fim 

1.00 

0.505 

1.74 

0.400 

0.401 

0.314 

x0  =  50  fim 

1.00 

0.517 

1.57 

0.401 

0.388 

0.296 

Xo  =  55  fim 

1.00 

0.517 

1.58 

0.447 

0.486 

0.352 

Note  that  the  severity  of  the  metric  distortion  depends  on  the 
sample  morphology  and  boundaries.  For  example,  Figures  S4  and 
S5  in  the  Supporting  Information  show  results  for  a  sample 
thickness  of  2  /<m  rather  than  7  fim.  It  can  be  seen  that  optical 
distortions,  such  as  the  nonzero  baseline,  are  less  severe  for  the 
2  fi m  thick  sample.  As  noted  earlier,  thinner  samples  can,  in 
general,  be  expected  to  be  less  susceptible  to  distortions  due  to 
optical  phenomena  than  comparable  thicker  samples.  Spectral 


metrics  are  also  affected  to  a  lesser  extent  as  can  be  seen  by 
comparing  the  metric  tables  for  the  2  fim  thick  sample  (Tables 
S2— S5  in  the  Supporting  Information)  to  the  metric  tables  for  the 
7  fim  thick  samples  above.  For  example,  in  the  latter,  a  maximum 
peak  shift  of  2.5  cm-1  is  observed,  while  for  a  2  /im  thick  sample, 
the  maximum  peak  shift  is  1  cm-1. 

The  dependence  of  spectral  distortions  on  sample  parameters 
is  important  from  two  perspectives.  First,  the  effect  of  geometry 
becomes  difficult  to  quantify  in  simple  terms.  Hence,  a  measure 
of  the  systematic  deviations  in  the  spectrum  must  be  individually 
calculated  for  specific  samples.  This  is  especially  important  for 
studies  that  are  interested  in  subtle  chemical  changes  at  edges 
(often  several  wavenumber  shifts)  or  in  an  algorithm-based  search. 
While  careful  simulations  are  prescribed  for  sensitive  chemical 
analyses,  the  strategy  in  database  searching  may  be  to  use  a 
coarse  spectral  resolution.  Second,  in  automated  analysis  algo¬ 
rithms  such  as  those  for  tissue  histopathology,36  sample  thickness 
becomes  an  important  parameter  whose  impact  must  be  appreci¬ 
ated.  One  approach  may  be  to  carefully  control  sample  thickness 
such  that  deviations  are  consistent  and  can  be  eliminated  from 
use  in  classification  algorithms  by  choice  of  appropriate  metrics. 
A  second  approach  is  to  use  a  large  number  of  samples  with  a 
thickness  variation  arising  from  the  natural  variation  of  the 


(36)  Fernandez,  D.  C.;  Bhargava,  R.;  Hewitt,  S.  M.;  Levin,  I.  W.  Nat.  Biotechnol. 
20  05,  23,  469-474. 
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Figure  7.  Point  mapping  spectra  from  a  7  fim  thick  toluene  slab  on  2  mm  of  barium  fluoride.  The  absorbance  is  normalized  by  the  slab 
thickness.  Spectra  are  shown  from  the  center  of  the  slab  ( x  =  0)  and  in  the  vicinity  of  the  edge  (x  =  50  /;m).  The  full  spectra  (a)  and  details  for 
x  =  0  (b-d),  x=  45  fim  (e-g),  x=  50  /im  (h-j),  and  x=  55  fim  (k-m)  are  shown.  A  baseline  is  illustrated  by  dashed  lines,  and  peak  heights 
and  positions  are  calculated  as  illustrated  by  the  solid  vertical  lines.  The  calculated  metrics  are  given  in  Tables  3  and  4. 


Table  3.  Peak  Positions  for  the  Point  Mapping  Data  to 
the  Nearest  0.5  cm  1 


peak  1 

peak  2 

peak  3 

peak  4 

peak  5 

peak  6 

ideal 

3027.0 

2920.0 

1495.5 

1460.5 

1030.0 

1081.5 

*o  =  0 

3027.0 

2920.0 

1495.5 

1460.0 

1030.0 

1081.5 

x0  =  45  fi  m 

3027.0 

2919.5 

1495.5 

1460.0 

1030.0 

1081.0 

x0  =  50  fi  m 

3026.5 

2919.0 

1495.5 

1459.0 

1030.0 

1081.0 

x0  =  55  fim 

3027.0 

2919.0 

1495.0 

1458.0 

1029.5 

1080.5 

Table  4.  Normalized  Peak  Heights  for  the  Point 
Mapping  Data 


peak  1 

peak  2 

peak  3 

peak  4 

peak  5 

peak  6 

ideal 

1.00 

0.514 

1.79 

0.436 

0.444 

0.350 

xo  =  0 

1.00 

0.502 

1.75 

0.386 

0.415 

0.326 

xQ  =  45  fim 

1.00 

0.504 

1.73 

0.403 

0.397 

0.309 

x0  =  50  fim 

1.00 

0.520 

1.56 

0.395 

0.382 

0.292 

Xo  =  55  fim 

1.00 

0.509 

1.55 

0.435 

0.475 

0.342 

protocol.  Any  developed  classification  algorithm  then  will  be 
insensitive  to  optics-induced  distortions  within  the  range  of 
thicknesses  used  in  the  development  of  the  protocol. 

Experimental  Comparison.  To  test  the  predictive  power  of 
the  model  presented  here,  it  is  useful  to  compare  experimental 
data  with  simulations  for  a  comparable  sample  and  imaging 
system.  The  sample  data  were  recorded  on  a  Varian  Stingray 


system  using  a  mid-IR  interferometer.  The  microscope  of  the 
instrument  is  equipped  with  a  narrowband,  liquid  nitrogen  cooled 
mercury-cadmium-telluride  (MCT)  detector,  as  well  as  a  128  x 
128  pixel,  liquid  nitrogen-cooled  focal  plane  array  MCT  detector. 
Data  are  recorded  at  an  undersampling  ratio  of  2  referenced  to 
the  He-Ne  laser,  zero-filled  by  a  factor  of  2,  and  Fourier 
transformed  using  Happ-Genzel  apodization.  The  nominal  spec¬ 
tral  resolution  was  2  cm'1.  The  ratios  of  two  similarly  collected 
image  sets  (one  without  a  sample  to  serve  as  a  background 
and  one  with  a  sample)  are  taken  pixel  by  pixel  to  obtain 
absorbance  image  datasets.  A  common  photoresist  material, 
SU-8  2000.5  (MicroChem  Corp.,  Newton,  MA),  was  spin  coated 
to  an  approximate  thickness  of  10  fim  on  a  25  mm  diameter 
barium  fluoride  (BaF2)  disk  and  pattern  cured  by  UV  exposure 
using  a  standard  USAF  1951  target  (Edmond  Optics,  Bar¬ 
rington,  NJ).  The  entire  sample  was  baked  at  95  °C  and 
developed  as  per  standard  protocols  for  postcuring.  A  postbake 
at  150  °C  for  5  min  was  performed  to  ensure  complete 
polymerization  and  long-term  stability. 

An  image  of  the  transmittance,  at  v  =  2903  cm-1,  for  a  region 
of  the  target  is  shown  in  Figure  8.  The  data  measured  along  the 
dashed  line  will  be  examined;  in  particular,  the  spatial- spectral 
response  across  the  edge  of  a  bar  structure  is  of  interest.  The 
absorbance  profile  along  the  dotted  line  shown  in  Figure  8  is 
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Figure  8.  Transmission  image  of  a  SU-8  bar  target  on  barium 
fluoride  at  2903  cmr1 .  In  subsequent  figures,  profiles  will  be  displayed 
from  along  the  dashed  line,  and  spectra  will  be  plotted  for  the  points 
marked  with  a  circle. 


Figure  9.  Absorbance  profiles,  before  (a)  and  after  (b)  baseline 
correction,  across  the  bar  target  for  three  different  wavenumbers.  At 
v  =  1283  crrr1  and  v  =  2903  crrr\  the  SU-8  polymer  is  absorbing. 
At  v  =  2100  cmr1,  the  polymer  is  nonabsorbing  but  scattering  effects 
produce  apparent  absorption  at  the  edges. 

plotted  for  three  different  wavenumbers  in  Figure  9a.  Apparent 
artifacts,  e.g.,  overshoot  in  the  absorbance  at  the  sample  edges, 
can  be  seen  to  vary  with  wavenumber.  These  distortions  arise 
from  both  baseline  offset  due  to  redistribution  of  intensity  by  the 
sample  and  changes  in  the  apparent  peak  shape.  Other  wave- 
number-dependent  effects  are  also  visible,  e.g.,  the  change  in 
spatial  resolution  as  a  function  of  wavenumber  is  manifest  in  the 
differing  gradients  of  the  absorbance  profiles  at  the  edge. 

Subtracting  a  slowly  varying  baseline  is  a  common  method  to 
compensate  for  the  consequences  of  optical  effects  on  spectra.  In 
Figure  9b,  the  edge  profiles  are  replotted  after  a  linear  baseline 
has  been  subtracted  from  the  spectra.  For  each  of  the  spectra, 
the  baseline  was  found  by  linear  interpolation  between  minima 
of  the  SU-8  response,  specifically  between  the  absorbance  values 
at  910,  1423,  1551,  1827,  2696,  2783,  3111,  3736,  and  3931  cm-1. 
It  can  be  seen  that  the  baselining  procedure  qualitatively 
improves  the  edge  profiles,  at  least  in  absorbing  regions  of  the 
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Figure  10.  Experimental  absorbance  spectra  taken  at  y  =  22  «m 
(a),  y  =  60.5  ym  (c),  and  y  =  71.5  ym  (e)  (i.e.,  the  points  marked 
with  a  circle  in  Figure  8)  and  simulated  spectra  taken  from  the  polymer 
50  (b)  and  6  «m  (d)  from  the  edge  and  from  off  the  polymer  5  ym 
from  the  edge  (f).  The  peak  location,  after  the  illustrated  baseline 
correction,  is  displayed  on  the  plots. 

spectrum.  Such  subjective  baselining  can  lead  to  seemingly 
reasonable  results,  especially  when  scattering  is  high  and 
absorbance  is  low.  For  automated  analyses,  which  are  required 
due  to  the  large  number  of  pixels  (spectra)  making  manual 
correction  impossible,  simple  corrections  may  lead  to  errors. 
For  example,  at  2100  cm-1,  the  baselining  procedure  has 
resulted  in  some  nonphysical  negative  values  of  absorbance. 
Another  potential  concern  is  the  discrepancy  in  absorbance 
between  the  two  bar  targets.  For  the  bar  centered  around  y  = 
20  /mi,  the  absorbance  values  at  1283  and  2903  cm-1  are 
approximately  equal,  while  the  neighboring  bar  exhibits  a 
greater  difference,  despite  being  made  of  the  same  material 
and  being  subject  to  the  same  processing  history. 

Quantitative  examination  of  the  collected  data  reveals  spectral 
distortions  of  the  type  predicted  earlier  in  the  article.  An  illustrative 
absorption  peak  is  centered  around  v  =  1508  cm-1.  Experimental 
measurements  of  this  peak  are  shown  for  various  sample 
locations  in  the  left  column  of  Figure  10.  Data  collection  from 
this  peak  can  be  simulated  by  first  estimating  the  physical 
properties  of  the  sample.  By  comparison  of  the  absorbance 
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measured  at  y  =  22  /<m  in  relation  to  the  imaginary  part  of  the 
refractive  index  of  SU-8  calculated  in  the  previous  article,  the 
thickness  of  the  SU-8  was  estimated  to  be  approximately  7  /v m. 
The  imaginary  part  of  the  refractive  index  was  then  estimated 
from  the  absorbance33  (again  from  the  measurement  at  y  =  22 
/mi).  Kramers- Kronig37  analysis  was  used  to  calculate  the  real 
part  of  the  refractive  index,  thus  completing  the  description  of 
the  object.  Note  that  the  SU-8  refractive  index  calculated  in  the 
previous  article  was  not  employed,  as  differences  in  sample 
preparation  were  found  to  have  introduced  small  but  significant 
differences  in  the  optical  properties  of  the  polymer. 

The  sample  edge  profile  and  the  instrument  were  modeled  in 
the  same  manner  used  to  generate  Figure  6,  except  that  the  inner 
and  outer  numerical  apertures  of  the  Cassegrain  were  taken  to 
be  0.26  and  0.4,  respectively.  These  values  are  consistent  with 
those  used  for  the  same  instrument  in  the  previous  article.  The 
experimental  and  predicted  spectral  profiles  of  parts  a  and  b  of 
Figure  10  agree  well.  This  is  to  be  expected  as  the  SU-8  refractive 
index  used  in  the  simulations  was  calculated  from  Figure  10a, 
and  this  region  of  the  sample  is  a  relatively  simple  layered 
structure. 

In  the  vicinity  of  the  polymer  edge,  the  peak  position  in  the 
experimental  data  can  be  seen  to  shift.  Since  the  target  structure 
is  made  of  a  single  material,  this  shift  can  most  likely  be  attributed 
to  optical  effects.  Nonuniform  curing  occurring  at  the  sample 
edges  can  be  ruled  out  due  to  extensive  postreaction  thermal  cure. 
The  simulations  also  predict  a  peak  shift  toward  lower  wavenum¬ 
ber;  however,  this  shift  is  greater  in  the  predictions  than  it  is  in 
the  measurements.  There  are  several  possible  causes  for  this 
overestimation.  The  characterization  of  the  sample  relied  on  a 
chain  of  estimation  procedures,  the  real  index  was  estimated  from 
the  imaginary  index  which  was  in  turn  dependent  on  the  assumed 
sample  thickness,  with  the  possibility  of  propagating  errors.  The 
correct  prediction  of  bulk  spectra,  however,  suggests  that  this 
error  is  small.  The  sample  geometry  may  also  lead  to  errors  in 
prediction.  In  simulation,  the  edge  is  represented  by  a  steep 
gradient  between  two  perfectly  flat  surfaces.  In  reality,  the  sample 
edges  can  be  expected  to  have  some  finite  and  unknown  slope, 
and  the  horizontal  surfaces  in  the  bar  targets  may  not  be  perfectly 
flat.  The  broad  agreement  between  the  experimental  and  simu- 
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lated  results  of  Figure  10  indicate  that  the  model  developed  has 
significant  predictive  power  and  allows  an  understanding  of  the 
causes  and  effects  of  optical  artifacts. 

CONCLUDING  REMARKS 

This  article  presents  the  first  attempt  at  applying  rigorous 
optical  theory  to  heterogeneous  samples  in  IR  microspectroscopy. 
It  is  shown  that  lateral  structure  in  thin  samples  leads  to  significant 
effects  on  the  recorded  spectral  data  arising  from  a  coupling 
between  wavelength,  sample  geometry,  optical  properties  within 
the  sample,  presence  of  interfaces,  and  the  optical  setup.  With 
the  use  of  progressively  sophisticated  simulations,  the  effect  of 
each  of  these  factors  was  demonstrated  in  a  quantitative  manner. 
It  was  shown  that  the  redistribution  at  the  detector  place  of  the 
intensity  incident  upon  the  sample  can  be  quantitatively  modeled 
and  verified  with  experiments.  The  implications  for  the  practice 
of  spectroscopy  are  that  the  spatial  and  spectral  variation  of  the 
real  and  imaginary  parts  of  the  index  of  the  sample  cannot  be 
decoupled  from  FT-IR  imaging  data,  as  is  currently  practiced.  It 
is  emphasized  that  recording  the  true  data  will  require  the 
development  of  both  new  instruments  that  can  provide  additional 
data  to  extract  true  spectral  properties  from  the  data,  as  well  as 
numerical  methods  to  assist  in  the  same.  The  theoretical  frame¬ 
work  presented  here  should  serve  as  a  useful  guide  to  estimate 
the  true  structure  and  quantify  distortions  in  present  instruments 
as  well  as  a  platform  for  future  development. 
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Fourier  Transform  Infrared  (FT-IR)  spectroscopic  imaging  is  emerging  as  an  automated  alternative 
to  human  examination  in  studying  development  and  disease  in  tissue.  The  technology’s  speed  and 
accuracy,  however,  are  limited  by  the  trade-off  with  signal-to-noise  ratio  (SNR).  Signal  processing 
approaches  to  reduce  noise  have  been  suggested  but  often  involve  manual  decisions,  compromising  the 
automation  benefits  of  using  spectroscopic  imaging  for  tissue  analysis.  In  this  manuscript,  we  describe 
an  approach  that  utilizes  the  spatial  information  in  the  data  set  to  select  parameters  for  noise  reduction 
without  human  input.  Specifically,  we  expand  on  the  Minimum  Noise  Fraction  (MNF)  approach  in 
which  data  are  forward  transformed,  eigenimages  that  correspond  mostly  to  signal  selected  and  used  in 
inverse  transformation.  Our  unsupervised  eigenimage  selection  method  consists  of  matching  spatial 
features  in  eigenimages  with  a  low-noise  gold  standard  derived  from  the  data.  An  order  of  magnitude 
reduction  in  noise  is  demonstrated  using  this  approach.  We  apply  the  approach  to  automating  breast 
tissue  histology,  in  which  accuracy  in  classification  of  tissue  into  different  cell  types  is  shown  to  strongly 
depend  on  the  SNR  of  data.  A  high  classification  accuracy  was  recovered  with  acquired  data  that  was 
~  10-fold  lower  SNR.  The  results  imply  that  a  reduction  of  almost  two  orders  of  magnitude  in 
acquisition  time  is  routinely  possible  for  automated  tissue  classifications  by  using  post-acquisition  noise 
reduction. 


1.  Introduction 

Fourier  Transform  Infrared  (FT-IR)  spectroscopic  imaging1 
with  array  detectors  provides  large  data  sets  but  often  requires 
large  times  for  acquisition  of  high  signal  to  noise  ratio  (SNR) 
data.  Following  conventional  trading  rules  in  IR  spectroscopy,2 
hence,  the  signal  is  recorded  multiple  times  and  added  to  increase 
the  signal  to  noise  ratio  (SNR)  of  the  data.  In  imaging,  other 
approaches  have  also  been  suggested  due  to  the  complex  nature 
of  the  acquisition  process. 3’4,5,6  Fundamentally,  these  methods 
unavoidably  traded  the  SNR  reduction  against  an  increase  in 
acquisition  time.  Another  approach  may  be  to  improve  hardware 
but  is  expensive  and  impractical  for  most  users.  A  final  and  very 
successful  approach  has  been  to  trade  off  the  spatial  coverage  per 
scan  using  sensitive  linear  array  detectors,  obviously  limiting  the 
spatial  coverage  rate.  For  a  finite  data  acquisition  time,  other 
schemes  to  extract  low  noise  information  are  available7  but  these 
methods  neglect  the  image  as  a  whole  and  result  in  loss  of  image 
fidelity.  As  a  consequence,  FT-IR  imaging  data  acquisition  is 
limited  in  applications  that  require  fast  imaging  at  high  fidelity. 

Using  computation  to  enhance  instrument  performance  is 
becoming  an  attractive  option  with  the  rapid  development  of 
powerful  computers  and  increased  storage  capacities.  A  proce¬ 
dure  based  on  the  Minimum  Noise  Fraction  (MNF)  transform,8 
for  example,  was  adopted  from  the  satellite,  airborne  and  other 
imaging  communities9  for  IR  spectroscopic  imaging.10,11 
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Similarly,  ideas  in  data  compression  and  with  the  potential  for 
attendant  noise  reduction  are  being  proposed  by  other 
groups.12,13  In  this  milieu,  a  general  approach  to  noise  reduction 
is  to  use  an  Eigenvalue  decomposition  of  the  data  using 
a  forward  transform,  for  example,  a  principal  components 
analysis  (PCA).  After  selecting  eigenimages  with  sufficient  SNR, 
the  selected  data  are  inverse  transformed  to  yield  the  entire 
dataset  with  lower  noise  content.  This  approach  was  used  in 
FT-IR  imaging,  for  example,14  to  examine  phase  compositions 
by  enhancing  contrast  between  different  regions.  PCA  reorders 
data  in  decreasing  order  of  variance.  Similarly,  techniques  can  be 
used  to  order  eigenimages  in  decreasing  order  of  SNR,  which  is 
the  aforementioned  MNF  transform.  A  modified  version15  of  this 
transform  was  shown  to  improve  image  fidelity  and  achieve 
better  noise  reduction  than  PCA,  for  example. 

Mathematical  transform  techniques  for  noise  reduction 
generally  utilize  the  property  that  noise  is  uncorrelated  whereas 
spectra  (signals)  have  a  higher  degree  of  correlation.  In  the 
transform  domain,  hence,  the  signal  becomes  largely  confined  to 
a  few  eigenvalues  whereas  the  noise  is  spread  across  all.  Noise 
reduction  can  be  achieved  by  retaining  eigenvalue  images  that 
corresponding  to  high  signal  content  and  computing  the  inverse 
transform.  It  is  the  relative  proportion  of  the  signal  and  noise 
which  forms  a  criterion  for  inclusion  of  specific  eigenimages  in 
the  inverse  transform.  Inclusion  of  too  many  will  not  allow  for 
significant  noise  rejection,  while  inclusion  of  too  few  would  result 
in  loss  of  fine  spectral  features.  Hence,  identifying  eigenvalues 
corresponding  to  high  signal  content  is  an  important  step  in  the 
noise  reduction  process.16,17  Most  methods16,18,19  choose  the  first 
m  eigenimages.  The  value  of  m  may  be  chosen  by  considering  the 
decay  of  the  information  content  (eigenvalues).  The  assumption 


2818  |  Analyst,  2010,  135,  2818-2825 


This  journal  is  ©  The  Royal  Society  of  Chemistry  2010 


Downloaded  by  University  of  Illinois  -  Urbana  on  24  November  2010 
Published  on  18  October  2010  on  http://pubs.rsc.org  |  doi:10.1039/C0AN00350F 


View  Online 


that  the  first  m  eigenimages  should  be  chosen,  however,  is 
questionable.  The  MNF  approach,  for  example,  was  specifically 
developed  to  overcome  the  observation  that  the  first  m  eige¬ 
nimages  in  PCA  were  not  always  optimal  and  proposed,  instead, 
a  noise-based  ordering.  Other  methods17,20  can  be  computa¬ 
tionally  expensive  or  do  not  utilize  some  of  the  features  of  the 
data.  All  methods,  hence,  place  the  ordering  burden  on  the 
decomposition  algorithm  and  do  not  directly  utilize  features  of 
the  data  or  the  unique  features  of  the  image  acquisition  process. 
The  method  proposes  here  addresses  this  gap  in  utilizing  data 
effectively  by  using  the  structure  within  the  data  to  select 
features. 

Another  general  criticism  of  present  methods  is  that  they  do 
not  explicitly  account  for  the  correlated  spatial  and  spectral 
information  in  the  data.  The  variance  in  data  may  arise  from 
measurement  noise,  sensor  characteristics  or  due  to  scattering 
effects  from  the  sample.  For  example,  the  MNF  approach  can  be 
shown  to  rigorously  order  images  in  decreasing  order  of  random 
noise.  Implicitly,  the  signal  in  the  re-ordering  of  MNF  eigenim¬ 
ages  is  assumed  to  arise  from  features  in  the  image  but  could 
come  from  those  other  than  the  sample  of  interest.  We  present 
such  a  case  in  Fig.  1,  which  shows  the  4th,  8th,  12th  and  19th 
eigenimages  for  FT-IR  imaging  data  from  a  breast  tissue  sample 
acquired  following  procedures  previously  reported.21  The  4th 
eigenimage  shows  interesting  tissue  morphological  i.e.  structural 
features.  Although  the  8th  eigenimage  has  higher  SNR  compared 
to  the  12th  or  19th,  the  12th  and  19th  eigenimages  seemingly 
contain  more  features  of  interest.  Obviously,  here  one  would 
include  the  12th  and  19th  but  not  the  8th  image  in  a  noise 
reduction  scheme.  The  8th  eigenimage  likely  arises  from  water 
vapor  differences,  as  can  be  seen  by  examining  the  spectra 
acquired  in  this  data  set  using  a  small  linear  array  that  is  raster 
scanned  horizontally  from  bottom  to  top,  and  not  from  the 
sample  itself.  Hence,  for  spectroscopic  imaging  data  sets,  it  may 
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Fig.  1  (A)  4th  MNF  Factor  (Tissue  features  are  apparent)  (B)  8th  MNF 
factor  (C)  12th  MNF  factor  (D)  19th  MNF  factor.  The  8th  factor  has  less 
apparent  structural  features  than  others  and  is  dominated  by  measure¬ 
ment  artifacts. 


be  more  instructive  to  employ  a  method  of  selecting  eigenimages 
that  accounts  for  both  spectral  and  spatial  correlations. 

There  is  no  universal  algorithm  to  optimally  include  both 
spatial  structure  pertinent  to  the  sample  and  spectral  character¬ 
istics  in  selecting  appropriate  eigenimages.  Hence,  the  identifi¬ 
cation  of  eigenimages  to  include  in  the  data  inversion  process  is 
invariably  a  manual  task.  This  requirement  makes  the  automa¬ 
tion  advantage  limited  and  is  a  key  impediment  to  automated 
and  routine  application  of  noise  rejection  methods.  First,  though 
the  effect  is  likely  to  be  small,  manually  selected  eigenimages  will 
likely  vary  from  practitioner  to  practitioner  and  may  lead  to 
variance  in  scientific  conclusions  or  confidence  in  results.  Second, 
the  need  to  examine  every  eigenvalue  image  (or,  at  least,  a  large 
set  of  images)  is  time-consuming.  The  decision  to  exclude  or 
include  images  with  questionable  content  requires  significant 
time  and  some  other  guidance,  e.g.  a  complementary  optical 
microscopy  image.  While  such  data  are  often  available,  they  are 
presently  not  used  in  noise  rejection.  In  this  manuscript,  we 
propose  a  method  to  automatically  determine  eigenimages  to  use 
in  an  inverse  transform  for  effective  noise  rejection  by  enabling 
the  use  of  additional  information  to  recognize  important  features 
in  the  data.  The  proposed  algorithm  selects  eigenimages  based  on 
structural  features  in  a  quantitative  manner  by  utilizing  both  the 
correlation  between  spectra  as  well  as  the  spatial  information  in 
the  image.  We  test  the  automated  noise  rejection  algorithm  by 
comparing  information  about  tissue  structure  extracted  from 
data  before  and  after  noise  rejection.  Last,  the  improvements  in 
SNR  are  quantified  and  discussed  in  terms  of  potential  data 
acquisition  strategies. 

2.  Methods 

2.1.  Mathematical  background  to  the  proposed  method 

The  MNF  transform  was  introduced  by  Green  et  al*  to  order 
multispectral  data  in  terms  of  image  quality  and  we  briefly 
describe  the  background  to  our  approach  next.  Consider  a  three- 
dimensional  (3-D)  dataset  X^t)  where  t  =  ( ij )  represents  spatial 
data  coordinates  and  k  denotes  the  spectral  element  index. 
If  the  number  of  spectral  elements  in  the  data  are  M,  then 
X(T)  =  [X1(T),X2(t’),X3(t  )...XM(i’)]T  and  the  true  spectral  value, 
S  and  additive  noise,  N,  are  related  as 

X(t)  =  S(t)  +  N(t)  (1) 

Consequently,  the  covariances  are  related  through 

Cov(X)  =  Cov(S)  +  Cov(N).  (2) 

Next,  the  noise  fraction  for  the  kth  spectral  element  is  defined  in 
terms  of  the  variance  of  the  noise 

Fk=  Var(Nk)IVar(Xk)  (3) 

which  is  the  ratio  of  noise  variance  to  the  total  variance  of  that 
spectral  element.  The  MNF  transform  is  a  linear  combination  of 
bands 


Yk(7) 


m=  1 


(4) 
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such  that  the  noise  fraction  Fk  is  minimum  for  Yk(  t )  among  all 
linear  transformations  orthogonal  to  Y/T),  j  =  1,  2,  ...k. 
The  vectors  ak  =  [ak,ci2,ak. •  -«M]rare  the  left  hand  eigenvectors  of 
2n  Also  the  eigenvalue  corresponding  to  ak  is  equal  to  the 
noise  fraction  of  Yk,  i.e. 

4  =Fk  (5) 

The  definition  of  MNF  would  imply  that  Xj  <  X2  £...  £  XM. 
Since  Xk  corresponds  to  the  noise  fraction,  MNF  re-orders 
spectral  elements  in  terms  of  increasing  Fk  or  equivalently,  in 
terms  of  decreasing  SNR.  The  same  set  of  eigenvectors  is 
obtained  from  maximizing  SNR  or  the  noise  fraction.  Flowever, 
the  approach  that  maximizes  SNR  would  result  in  higher 
eigenvalues  corresponding  to  higher  SNR  and  the  MNF  trans¬ 
form  would  result  in  decreasing  order  of  SNR  corresponds  to 
decreasing  order  of  eigenvalues  X!  >  X2  s...  £:  XM.  Our  imple¬ 
mentation  uses  this  approach  to  compute  MNF  transforms.  It  is 
useful  to  note  that  since  the  MNF  transform  depends  on  signal  to 
noise  ratio  it  is  invariant  under  scale  changes  to  any  band  (unlike 
principal  components). 

2.2.  Proposed  algorithm  based  on  MNF-transform  for  noise 
reduction 

The  MNF  transform  is  first  computed  following  the  method 
above.  In  heterogeneous  materials  and  tissue,  we  note  that  the 
eigenimages  also  have  structure  corresponding  to  the  true 
structure  of  the  material.  The  contrast  and  precise  values  of  any 
spectral  and  eigenimage,  of  course,  cannot  be  equated  but  both 
types  of  images  have  distinct  spatial  domains.  Domains  are 
defined  by  their  edges  and  this  property  forms  the  basis  of  our 
eigenimage  selection  scheme.  Our  proposed  method  relies  on 
leveraging  the  spatial  structure  in  spectroscopic  imaging  data 
with  structural  details  in  eigenimages  via  comparisons  of  domain 
edge  profiles.  Domains  in  breast  tissue,  for  example,  include 
boundaries  of  the  sample,  ducts  and  transitions  between  different 
structural  units.  Several  methods  for  edge  detection22  based  on 
different  filters  and  different  thresholding  schemes  have  been 
proposed  and  studied.  Canny’s  method,23  in  particular,  is  widely 
used  and  was  found  to  be  effective  for  our  application.  We 
evaluated  two  other  edge  detection  methods  (Sobel,  Roberts)  but 
found  the  Canny  method  better  suited  to  the  relative  domain  and 
pixel  sizes  likely  because  Canny’s  method  has  been  shown  to  be 
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Fig.  2  A  typical  image  (left)  obtained  by  plotting  the  absorbance  of  the 
sample  (here  at  3400  crm1)  and  the  corresponding  edge  map  (right) 
obtained  after  median  filtering  and  using  Canny’s  method. 


optimal  with  respect  to  detection,  localization  and  response.23 
The  result  of  edge  detection  is  a  binary  image  that  is  termed  an 
‘edge  map’.  A  typical  absorbance  image  and  edge  map  is  shown 
in  Fig.  2.  While  it  is  indeed  possible  to  determine  edges  for  any 
image  in  general,  confounding  effects  may  arise  when  the  domain 
sizes  are  similar  to  pixels  sizes  and  the  images  are  noisy.  Hence, 
an  intermediate  step  may  be  to  use  a  median  filter.  The  choice  of 
size  of  the  median  filter  will  be  a  compromise  between  the  size  of 
structural  features  and  pixel  sizes.  Using  a  large  median  filter 
would  be  effective  in  removing  pixel-to-pixel  variations  but  could 
also  result  in  loss  of  features,  especially  those  that  are  smaller 
than  the  size  of  the  median  filter.  Median  filters  of  sizes  between 
7x7  and  13x13  were  found  to  be  most  effective  for  the  samples 
considered  here  in  that  the  results  were  very  consistent  regardless 
of  the  choice  of  filter.  Hence,  we  elected  to  routinely  use  a  9  x  9 
pixel  filter  which  is  used  on  the  data  in  Fig.  2  prior  to  obtaining 
the  edge  map. 

Once  edge  maps  for  the  eigenimages  are  obtained,  we  seek  to 
compare  them  to  an  ‘ideal’  image  (I).  It  is  desirable  that  this 
image  contains  all  structural  details  of  interest  as  well  as  be  of 
high  SNR.  The  edge  map  of  I  and  edge  maps  of  eigenimages  can 
then  be  compared  and  eigenimages  that  are  sufficiently  repre¬ 
sentative  of  the  sample  structure  may  be  included  in  the  inver¬ 
sion.  The  proposed  method  does  not  depend  on  any  specific 
method  to  generate  edge  maps  and  diverse  contrast  mechanisms 
can  be  used  to  construct  the  edge  map  of  the  ideal  image.  Several 
possibilities  are  discussed  next.  The  first  MNF  eigenimage 
corresponds  to  the  highest  SNR  and  likely  contains  the  greatest 
sample  detail;  hence,  it  could  be  used  as  an  ideal  image.  Another 
avenue  may  be  to  choose  an  image  based  on  the  molecular 
characteristics  of  the  sample  if  prior  knowledge  about  the  sample 
is  available.  For  example,  spectral  characteristics  of  tissues  for 
a  given  organ  is  often  well-constrained  in  the  spectral  regions 
between  ~  950  cm~'  (lower  detector  cut-off  in  some  of  the 
experiments  here)  to  ~  1 800  cm  1  (mainly  bending  and  rocking 
vibrational  modes  of  molecules)  and  from  ~2765  cm-1  to 
~3750  cm1  (stretching  modes).  An  integrated  absorbance  in 
those  regions  may  be  used  but  can  be  susceptible  to  edge 
distortions  due  to  molecularly  non-specific  scattering.24  While 
multiplicative  scatter  correction25  and  rigorous  optical  theory26,27 
approaches  are  emerging,  an  approximation  to  removing  scat¬ 
tering  distortion  is  the  use  of  second  derivatives  of  spectra.28  The 
sum  of  the  absolute  values  of  the  second  derivative  data  is  then 
indicative  of  the  overall  chemical  composition  of  the  tissue.  The 
Savitzky-Golay  filter  used  for  computing  derivatives  also  reduces 
noise  while  preserving  peak  heights  and  widths,  providing  a  high 
SNR  I  (Fig.  2)  that  captures  features  from  important  spectral 
bands.  Yet  another  alternative  is  to  calculate  the  Gram-Schmidt 
intensity  of  the  interferogram  of  the  sample,29  which  could  be 
a  faster  route  by  precluding  the  FT-process.  The  image,  however, 
would  retain  both  structural  and  biochemical  contributions  from 
all  functional  groups  and  scattering  interfaces.  Finally,  another 
approach  could  be  to  use  the  bright  field  optical  microscopy 
image.  The  optical  image,  however,  may  not  contain  sufficient 
contrast,  have  differences  observed  in  the  IR  image  or  may 
experience  a  mismatch  in  resolution.  The  IR  “bright  field” 
equivalent,  which  is  simply  the  height  of  the  centerburst  may  be 
used.  Since  a  background  is  collected  for  absorbance  data,  the 
sample  data  set  can  be  easily  corrected  for  illumination 


2820  |  Analyst,  2010,  135,  2818-2825 


This  journal  is  ©  The  Royal  Society  of  Chemistry  2010 


Downloaded  by  University  of  Illinois  -  Urbana  on  24  November  2010 
Published  on  18  October  2010  on  http://pubs.rsc.org  |  doi:10.1039/C0AN00350F 


View  Online 


differences.  This  approach  can  be  considered  a  combination  of 
both  IR  spectral  (absorbance)  and  visible  optical  (scattering) 
imaging. 

Having  chosen  an  ‘ideal’  image  I,  its  edge  map  Ej  is  computed. 
Next,  each  eigenimage  is  filtered  using  the  same  kernel  as  that 
used  for  the  ideal  image  edge  map  and  edge  maps  Ep_/  =  1,  ...,M 
can  be  found.  In  practice,  the  number  of  significant  eigenimages 
are  much  smaller  than  the  number  of  spectral  data  points.  Hence, 
it  is  prudent  to  consider  a  smaller  subset  of  eigenimages  to  save 
computation  time.  In  carefully  examining  resulting  images  from 
the  MNF  transform,  we  noticed  that  most  information  content 
was  in  the  first  30  eigenimages.  Hence,  we  chose  to  examine 
a  smaller  subset  of  MNF  transformed  eigenimages  ( P  =  64)  from 
the  available  spectral  data  points  (1640).  This  represents 
a  substantial  reduction  in  the  time  for  comparison  and  data 
storage  needed.  Next,  the  root  mean  square  error  (RMSE) 
between  E!  and  Ej,  y  =  1,2,3..., P  is  computed  as  a  measure  of  the 
spectral  similarity  of  the  images  using 


RMSEj 


N 


bx  by 

!>(/>,<?)  -  Ej(P’<l))2  J  =  1,2,3,...  P  (6) 


P=  1  9=1 


Where,  bx  and  by  are  the  pixels  along  the  row  and  columns  of  the 
image  array.  A  typical  plot  of  RMSE  as  a  function  of  eigenimage 
number  is  shown  in  Fig.  3.  The  RMSE  prior  to  sorting  is  the 
decreasing  order  of  importance  from  the  MNF  transform  and 
shows  that  factors  corresponding  to  higher  eigenvalues  (lower 
eigneimage  number)  may  not  necessarily  have  significant 
features.  While  the  eigenvalue  curve  is  obviously  monotonically 
decreasing,  the  RMSE  curve  of  the  MNF  eigenimages  displays 
significant  fluctuation.  Re-ordering  by  RMSE  values  not  only  re¬ 
orders  the  important  eigenimages  by  assigning  them  a  lower 
number  but  makes  the  curve  smooth  and  amenable  to  accurately 
determining  saturation  or  calculating  derivatives.  It  is  notable, 
though,  that  the  actual  RMSE  error  is  not  affected  by  re¬ 
ordering  and  the  information  content  of  all  the  transformed  data 
is  only  re-prioritized  and  not  altered  in  any  manner.  It  is 
instructive  to  compare  the  RMSE  ordered  and  MNF  ordered 
eigenimages  (Fig.  3).  While  it  appears  that  ~60  eigenimages 
would  be  important  from  the  MNF-ordered  plot,  the  RMSE- 
ordered  curve  indicates  that~30  images  may  be  useful  in  the 
inverse  transform.  This  discordance  is  due  to  the  sensitivity  to 
any  structure  in  the  image  in  MNF-ordered  data,  while  the 
RMSE-ordered  data  are  only  sensitive  to  the  structure  in  the 


Fig.  3  (Left)  Typical  error  plot  before  sorting  (red)  and  after  sorting 
(black)  for  RMSE,  where  the  increasing  eigneimage  number  indicates 
a  decreasing  order  of  importance.  (Right)  Eigenimage  order  (number)  as 
ranked  by  the  MNF  transform. 


reference  image.  Hence,  prioritizing  eigenimages  by  RMSE  is 
likely  beneficial.  It  should  be  noted  that  the  eigenimage  number 
for  the  unsorted  RMSE  and  MNF-order  is  the  same.  While  there 
is  a  generally  increasing  trend  (decreasing  importance)  for  both 
values,  the  RMSE  plot  appears  to  be  noisy.  Since  the  RMSE  is 
directly  a  measure  of  concordance,  hence,  we  sorted  eigenimage 
numbers  based  on  increasing  RMSE  and  assigned  them  new 
eigenimage  numbers  based  on  RMSE  values. 

The  importance  of  the  alteration  by  sorting  can  be  understood 
by  examining  the  RMSE  plot  (Fig.  3)  in  conjunction  with  the 
spatial  features  in  eigenimages  as  seen  in  Fig.  4.  Eigenimages  and 
their  corresponding  edge  maps  demonstrate  first  that  images 
with  significant  features  ( e.g .  numbers  1,  3,  10  and  18)  have  well 
defined  edge  maps  while  those  without  significant  features 
(e.g.  number  46)  have  nondescript  edge  maps.  Second,  the  spatial 
similarity  of  early  factors  with  the  reference  edge  map  results  in 
lower  RMSE  values  that  increase  with  increasing  noise.  When 
information  content  of  the  image  is  dominated  by  noise,  the 
RMSE  between  any  edge  map  and  that  of  the  ideal  image  is 
nearly  independent  of  the  actual  edges,  resulting  in  the  plateau 
region  of  the  RMSE  curve.  Eigenimages  close  to  the  chosen  cut 
off  have  edge  maps  with  a  semblance  of  features  buried  in  noise. 
By  choosing  all  factors  corresponding  to  RMSE  values  less  than 
that  at  the  cut-off  point,  we  select  only  those  factors  with 
significant  features.  The  derivative  of  the  curve  in  the  plateau 
region  is  negligible  and  could  also  be  utilized  in  finding  the  cut¬ 
off  point.  We  choose  the  cutoff  to  be  the  point  after  which  the 
derivative  does  not  rise  more  than  p  +  3  cr,  where  p  and 
c>  correspond  to  the  mean  and  standard  deviation  of  the  deriv¬ 
ative  of  flat  region  of  the  curve.  This  is  a  very  strict  condition 
which  maintains  a  high  degree  of  spectral  detail.  Other  cutoff 
values  may  be  chosen,  for  example,  p  +  a  or  simply  the  first 
image  whose  RMSE  exceeds  p.  Our  interest  was  in  preserving  as 
much  spectral  detail  as  possible;  hence,  we  adopt  a  criterion  that 
may  be  more  stringent  than  most  and  likely  represents  a  lower 
level  of  improvement  in  SNR  than  other  cutoffs.  In  summary, 
computing  the  MNF  transform,  selecting  eigenimages  based  on 
sorted  RMSE  from  edge  maps  and  computing  the  inverse  MNF 
using  the  reduced  set  of  eigenimages  prior  to  the  cutoff  is 
a  completely  automated  noise  reduction  algorithm  that  does  not 
require  human  input.  There  are  choices  that  can  be  made  while 
setting  up  the  protocol,  for  example,  in  choice  of  the  reference 
image,  that  are  under  operator  control.  Once  the  protocol  is 
finalized,  however,  the  process  is  entirely  automated  and  can  be 
high  throughput.  Thus,  the  criteria  of  both  objectivity  and 
automation  for  noise  reduction  are  addressed. 

3.  Experimental 

Tissue  used  for  this  study  (Biomax  Inc.)  was  processed  as  per 
procedures  reported  earlier.30  Spectroscopic  imaging  data  are 
acquired  using  the  Perkin-Elmer  Spotlight  400  imaging  spec¬ 
trometer  that  is  equipped  with  a  linear  array  detector  and 
samples  a  6.25pm  x  6.25pm  area  per  pixel.  An  undersampling 
ratio  of  two  with  reference  to  the  He-Ne  laser  and  mirror  scan¬ 
ning  speed  of  1  cm/s  is  used  to  sample  the  interferogram  to 
provide  a  spectral  resolution  of  4  cm~'.  The  interferogram  at 
every  pixel  is  then  Fourier  transformed  using  a  zero-filling  factor 
of  two  and  N-B  medium  apodization  and  truncated  to 
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Eigenimage  1 
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Eigenimage  3 
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(A)  MNF-sorted  Eigenimages 
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(B)  Eigenimage  Edge  Maps 


Eigenimage  10 


Eigenimage  18 


Eigenimage  37 
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Eigenimage  46 


Eigenimage  46 


(C)  RMSE  re-sorted  Eigenimages 
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Fig.  4  Typical  eigenimages  (A)  ordered  by  the  MNF  transform.  (B)  corresponding  edge  maps  for  MNF-ordered  images.  (C)  RMSE  re-sorted  eige¬ 
nimages  within  this  subset. 


4000—720  ctrr1  for  efficient  storage.  A  background  single  beam 
reference  is  collected  by  averaging  120  scans  and  sample  spectra 
are  acquired  by  averaging  two  interferometer  scans.  To  validate 
the  method  for  different  instruments,  we  implemented  the  same 
algorithm  on  data  acquired  from  a  system  equipped  with  a  large 
two-dimensional  array  detector  (Yarian  Stingray).  The  system 
consists  of  a  Varian  7000  Spectrometer  coupled  to  a  microscope 
accessory,  UMA-400.  The  imaging  detector  is  a  liquid  nitrogen- 
cooled  mercury  cadmium  telluride  (MCT)  focal  plane  array  that 
is  windowed  to  32  x  32  elements  (Santa  Barbara  Focalplane). 
The  detector  samples  an  area  of  175pm  x  175pm  at  the  sample. 
Interferograms  were  acquired  in  rapid  scan  mode  with  an 
undersampling  ratio  of  2  at  a  spectral  resolution  of  4  cm~ 1  and 
Fourier  transformed  using  a  factor  of  two  zero-filling  and  Nor- 
ton-Beer(NB)  medium  apodization.  The  data  were  truncated  to 
4000  —  950  cnr1  for  storage.  For  these  data,  the  number  of 
co-additions  were  varied  (1,  2,  4,  8,  16,  32  and  64  scans)  to  obtain 
a  range  of  poor  to  good  SNR  data.  The  background  reference 
was  collected  at  120  co-additions. 

All  software  used  was  written  in-house  or  utilized  programs  in 
ENVI/IDL.  Computing  MNF  transforms  involves  estimating 
noise  statistics.  ENVI  can  use  a  shift  difference  method  to 
compute  noise  statistics,  which  assumes  that  every  pixel  contains 
both  signal  and  noise,  and  that  adjacent  pixels  contain  the  same 
signal  but  different  noise.  A  shift  difference  is  performed  on  the 
data  by  differencing  adjacent  pixel  above  and  to  the  right  of  each 
pixel  and  averaging  the  results  to  obtain  the  ‘noise’  value  to 


assign  to  the  pixel  being  processed.  To  the  extent  that  this 
assumption  is  not  true,  the  noise  statistics  estimate  is  in  error. 
Rigorously,  the  noise  should  be  estimated  using  repeat 
measurements,  as  that  is  easily  possible  in  FT-IR  imaging.  With 
the  commercial  raster  scanning  system,  however,  we  were  unable 
to  obtain  successive  measurements  without  a  new  scan.  The 
positioning  error  on  the  stage  was  such  that  slight  pixel  shifting 
was  observed,  precluding  true  averaging  at  every  pixel.  Hence, 
we  employed  the  shift  difference  method  in  this  study.  The  pixel 
size  being  set  smaller  than  the  lowest  resolution  achievable,  and 
the  general  nature  of  large  phases  in  the  data  here  likely  result  in 
the  estimate  being  close. 

4.  Results  and  discussion 

In  order  to  quantify  the  SNR  gain  from  noise  reduction,  we  first 
acquired  high  SNR  data  using  the  linear  array  system  as  a  base 
for  simulations  and  as  a  comparator.  Poor  SNR  data  is  simu¬ 
lated  from  this  data  by  adding  noise  from  a  normal  distribution 
with  different  standard  deviations  (cr  =  0.001,  0.01,  0.1  and 
0.4  a.u.)  as  shown  for  a  single  pixel  in  Fig.  5(A).  Resulting  spectra 
after  noise  reduction  are  shown  in  Fig.  5(B).  An  improvement  is 
apparent,  even  in  cases  where  noise  appears  to  overwhelm 
spectral  features.  We  then  acquired  data  on  the  large  array 
system  in  which  single  scan  acquisition  was  compared  to  64  scan 
acquisition  (~70  fold  slower).  As  expected,  noise-reduced  data 
were  found  to  be  comparable  to  the  high  scan  numbers.  To 
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Fig.  5  (A)  Acquired  high  SNR  data  and  simulated  noisy  spectra  obtained  by  adding  noise  (o  =  0.001,  0.01,  0.1  and  0.4  a.u.),  showing  the  degradation  in 
data  quality.  Spectra  are  offset  for  clarity.  (B)  Corresponding  spectra  after  noise  reduction.  (C)  Absorption  spectrum  (1-scan  in  black)  compared  to  the 
resulting  spectrum  from  the  same  pixel  after  noise  reduction  (blue)  and  to  that  acquired  by  averaging  64  scans  (red). 
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Fig.  6  Noise  before  (input  noise)  and  after  application  of  the  algorithm 
(output  noise).  An  order  of  magnitude  improvement  can  be  observed. 


quantify  the  benefits  of  post-acquisition  processing,  the  reduc¬ 
tion  in  noise  achieved  is  quantified  in  Fig.  6.  Noise  values  were 
calculated  using  the  non-absorbing  1950  cm-1  -2000  cm-1  region 
with  41  spectral  points  around  1975  cm-1  and  are  averages  of 
1024  spectra. 

The  dashed  diagonal  is  the  unity  gain  line  that  separates 
decrease  or  increase  in  noise  upon  application  of  the  algorithm. 
The  plot  indicates  success  in  applicability  over  three  orders  of 
magnitude  of  input  noise  where  an  order  of  magnitude  noise 
reduction  is  observed.  The  actual  noise  reduction  depends  on  the 
number  of  factors  chosen  for  the  inverse  transform,  the  number 
of  pixels  in  the  original  data  set  and  the  degree  of  correlation  in 
the  noise.  If  the  noise  is  high  enough,  the  benefit  is  observed  to  be 
proportional  to  the  input  noise.  For  very  low  noise  cases,  the  plot 
indicates  that  it  becomes  difficult  to  improve  the  data  further. 
This  behavior  likely  arises  from  the  distribution  of  noise  and 
information  in  eigenimages.  It  must  be  noted  that  many  of  the 
eigenimages  rejected  in  the  inverse  transform  do  contain  infor¬ 
mation  and  all  selected  do  contain  noise  that  is  both  correlated 
and  uncorrelated.  Hence,  the  limitation  of  the  process  arises  from 
both  correlated  noise  and  the  need  to  balance  information 
content  of  eigenimages  with  the  opportunity  to  reduce  noise,  We 
have  used  a  fairly  conservative  approach  to  noise  reduction  in 
that  fewer  eigenimages  could  have  been  selected  in  the  inversion, 
which  may  also  explain  the  lack  of  significant  improvements 
when  the  input  noise  is  low.  It  is  interesting  to  note  that 
a  previous  application  of  the  MNF  transform10  also  provided 
a  limit  to  the  improvement  possible  with  this  approach,  but  in  the 


high  noise  limit.  There,  the  high  input  noise  data  were  found  to 
contain  a  low  frequency  response  in  the  spectra  of  inverse 
transformed  data  that  limited  the  noise  reduction  achieved.  In 
summary,  the  forward-reverse  transform  approach  appears  to  be 
bounded  in  its  ability  to  improve  data  quality  in  both  the  high 
noise  (as  previously  shown)  and  low  noise  cases  (as  observed 
here).  These  limits  must  be  considered  when  designing  data 
acquisition  protocols  that  take  advantage  of  this  post-processing 
approach. 

From  the  trading  rules  of  FT-IR  spectroscopy,3,31  a  factor  of 
n  improvement  in  SNR  requires  an  increase  of  rf  in  data 
acquisition  time.  Hence,  a  method  to  increase  data  acquisition 
rate  without  loss  in  its  quality  could  involve  rapid  data  collection 
at  a  low  SNR  followed  by  application  of  numerical  techniques 
for  noise  reduction.  The  order  of  magnitude  improvement,  as  we 
show  above,  allows  for  close  to  two  orders  of  magnitude 
reduction  in  scanning  time.  To  test  this  hypothesis,  we  compared 
noise  reduced  data  from  a  single  interferometer  scan  with  data 
obtained  by  averaging  64  scans  (Fig.  5(D)).  Spectra  with  only  one 
scan,  after  noise  reduction,  closely  resemble  spectra  obtained 
from  64  scans  experimentally.  Caution  must  be  exercised, 
however,  in  claiming  that  mathematical  techniques  provide 
precisely  equivalent  data.  As  can  be  seen  from  the  spectra,  there 
are  some  low  frequency  noise  components  in  the  noise-reduced 
spectrum  that  were  not  eliminated.32  Noise  reduction  has 
important  implications  in  areas  where  data  quality  cannot  be 
improved  by  averaging  (e.g.  kinetics  measurements),33  for  low- 
throughput  configurations  such  as  total  internal  reflection 
sampling,34'35,36  where  large  quantities  of  data  are  acquired  or 
where  the  analyte  signal  is  low.  An  interesting  test  case  in  to 
perform  histopathology  without  human  intervention37  faster 
than  with  current  data  acquisition  protocols.  Briefly,  FT-IR 
microspectroscopy  combined  with  pattern  recognition  tools38  is 
rapidly  developing  as  a  potential  tool  for  automated  structure39 
and  disease  recognition40,41,42  within  complex  tissue  by  a  number 
of  groups.43,44  Unfortunately,  the  time  to  acquire  data  from  large 
numbers  of  samples  is  prohibitive.  For  example,  a  recent  study30 
reported  the  quantitative  evaluation  of  classification  using  large 
sample  and  data  sets  that  required  many  months  to  acquire. 
Reducing  data  acquisition  time  through  automated  noise 
reduction  will  help  reduce  time  in  laboratory  studies.  When  the 
approach  is  translated  to  clinical  venues,  it  will  serve  to  enhance 
the  speeds  and  throughput  of  samples.  As  an  example.  Fig.  7 
illustrates  the  benefits  of  using  automated  noise  reduction. 
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Fig.  7  Effect  of  automated  noise  reduction  on  prostate  tissue  classifi¬ 
cation.  Top  row:  classification  results.  Bottom  row:  absorbance  in  a  tissue 
sample  at  1080  cm-1  (A)  high  SNR  data  in  which  measured  baseline  noise 
is  ~0.001  a.u.  with  the  corresponding  classified  images  showing  three 
types  of  cells.  (B)  Lower  SNR  data  in  which  measured  baseline  noise  is 
~0.005  a.u.,  demonstrating  that  the  classification  becomes  noisy  and  (C) 
noise  reduced  data  set  obtained  from  the  data  in  (B),  demonstrating  that 
the  classification  errors  are  reduced. 

Prostate  tissue  is  classified  into  its  constituent  cell  types.  Classi¬ 
fication  is  inaccurate  for  the  higher  noise  case  but  is  recovered 
when  the  noise  is  reduced.  The  time  for  data  acquisition  for  this 


500  pm  x  500  pm  image  set  was  reduced  from  ~45  min  to  less 
than  2  min.  While  the  result  demonstrates  qualitative  agreement 
between  the  classified  images,  we  examine  next  a  detailed  quan¬ 
titative  assessment  of  the  fidelity  of  inverse  transformed  data  and 
the  benefits  of  noise  rejection  for  tissue  classification. 

The  accuracy  of  tissue  classification  is  related  to  SNR  of  the 
data,  as  has  been  demonstrated  previously  for  prostate  tissue.45 
Here,  we  sought  to  apply  the  same  exercise  to  breast  tissue.  We 
acquired  data  from  10  tissue  samples  consisting  of  almost  8000 
spectra  per  sample.  The  samples  contain  a  variety  of  cell  types 
and  disease  states.  As  a  first  step  towards  classification  for 
disease  diagnoses,  breast  tissue  is  divided  into  two  cell  types 
(epithelial  cells,  which  are  indicated  in  green,  and  stromal  cells, 
which  are  indicated  in  magenta).  The  effect  of  decreasing  data 
quality  can  be  seen  in  classified  images  shown  in  Fig.  8  A-D  (top 
row).  Noise  in  the  underlying  absorbance  data  increases  from  A 
to  D,  thereby  noise  in  the  classified  images  increases  progres¬ 
sively  until  all  ability  to  segment  tissue  is  lost  for  noise  levels  ~0.1 
a.u  (Fig.  8D).  We  quantified  classification  accuracy,  further  as 
measured  by  calculating  the  area  under  the  curve  (AUC)  of  the 
receiver  operating  curve(ROC)46  for  pixels  that  meet  the 
threshold  for  classification,  in  Fig.  8  (E).  As  a  function  of  average 
noise  in  the  absorbance  data,  AUC  values  finally  fall  to  about 
0.5,  which  is  equivalent  to  random  guessing  and  does  not  provide 
any  useful  classification  information.  At  the  higher  noise  levels, 
some  tissue  pixels  are  not  even  recognized  as  meeting  the 
threshold  for  inclusion.  For  intermediate  noise  levels,  classifica¬ 
tion  accuracy  decreases. 


Fig.  8  Effect  of  noise  in  the  absorbance  data  on  image  classification  is  illustrated  for  breast  tissue  in  A-D  (top  panel),  where  the  noise  in  the  data  is 
calculated  to  be  0.0001,  0.001,  0.001  and  0.1  a.u.,  respectively.  (E)  Classification  accuracy,  as  measured  by  the  area  under  the  receiver  operating 
characteristic  curve,  decreases  with  increasing  noise  for  both  cell  types.  Image  classification  is  shown  upon  using  the  noise  reduction  algorithm  (A-D, 
bottom  panel).  (F)  Classification  accuracy  before  and  after  noise  reduction. 
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The  impact  of  noise  reduction  on  classification  is  demon¬ 
strated  in  the  bottom  panel  of  Fig.  8.  Classified  images  for  each 
noise-reduced  case  (A-D)  demonstrate  that  classification  accu¬ 
racy  for  all  cases  appear  to  be  comparable  to  classification 
accuracy  for  low  noise  cases.  Examination  of  classified  images 
and  classification  accuracy  values  indicates  that  noise  reduction 
improves  classifier  performance  in  each  case.  For  low  noise  data, 
noise  reduction  does  not  appear  to  significantly  impact  classifi¬ 
cation  since  the  classification  accuracy  is  almost  100%.  On  the 
other  hand,  noise  reduction  significantly  improves  classification 
from  FT-IR  spectroscopic  imaging  data  with  higher  noise  levels. 
Hence,  a  potential  route  to  faster  data  acquisition  for  histopa- 
thology,  without  the  need  to  modify  hardware  or  change  any 
experimental  configuration,  can  be  proposed  based  on  post¬ 
processing  noise  reduction.  The  ten-fold  increase  in  noise  of  the 
data  to  provide  the  same  classification  accuracy  implies  that 
~  100-fold  decrease  in  data  acquisition  time  may  be  obtained. 
Instead  of  requiring  ~300  h  (12  days)  to  scan  a  1  cm  x  1  cm  area 
with  a  large  focal  plane  detector,  the  proposed  approach  will 
allow  the  same  in  ~3  h.  This  conclusion  is  one  of  the  more 
important  aspects  of  this  study,  implying  that  a  careful  noise 
rejection  protocol  can  speed  up  data  acquisition  to  make  present 
FT-IR  imaging  instrumentation  perform  analyses  within 
clinically  acceptable  time  periods. 

5.  Conclusions 

An  objective  eigenimage  selection  scheme  based  on  structural 
features  has  been  proposed  here  for  automated  noise  reduction 
after  data  acquisition.  An  order  of  magnitude  reduction  in  noise 
could  be  achieved  using  this  algorithm  when  the  noise  was  not 
very  low.  Applied  to  obtaining  results  from  samples,  for  example 
for  tissue  classification,  there  is  an  equivalent  recovery  of  correct 
results  at  higher  noise  levels.  The  improvement  translates  directly 
into  a  reduction  in  time  required  for  data  collection.  It  must  be 
noted  that  the  gain  here  is  through  post-acquisition  computa¬ 
tional  techniques  and  does  not  involve  changes  in  instrumenta¬ 
tion  hardware  or  data  acquisition  schemes.  Hence,  it  is  easy  to 
implement  and  inexpensive  to  deploy.  It  is  anticipated  that  the 
automated  nature  of  the  proposed  approach  will  allow  it  to 
become  routinely  applied  to  enhance  data  quality  and  the  recover 
scientific  results  with  lower  experimental  efforts  (time,  expense 
and  hardware)  in  data  acquisition. 
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There  is  an  underlying  assumption  on  most  model  building  processes:  given  a  learned  clas¬ 
sifier,  it  should  be  usable  to  explain  unseen  data  from  the  same  given  problem.  Despite  this 
seemingly  reasonable  assumption,  when  dealing  with  biological  data  it  tends  to  fail;  where 
classifiers  built  out  of  data  generated  using  the  same  protocols  in  two  different  laboratories 
can  lead  to  two  different,  non-interchangeable,  classifiers.  There  are  usually  too  many 
uncontrollable  variables  in  the  process  of  generating  data  in  the  lab  and  biological  varia¬ 
tions,  and  small  differences  can  lead  to  very  different  data  distributions,  with  a  fracture 
between  data. 

This  paper  presents  a  genetics-based  machine  learning  approach  that  performs  feature 
extraction  on  data  from  a  lab  to  help  increase  the  classification  performance  of  an  existing 
classifier  that  was  built  using  the  data  from  a  different  laboratory  which  uses  the  same  pro¬ 
tocols,  while  learning  about  the  shape  of  the  fractures  between  data  that  motivated  the  bad 
behavior. 

The  experimental  analysis  over  benchmark  problems  together  with  a  real-world  problem 
on  prostate  cancer  diagnosis  show  the  good  behavior  of  the  proposed  algorithm. 

©  2010  Elsevier  Inc.  All  rights  reserved. 


1.  Introduction 

The  assumption  that  a  properly  trained  classifier  will  be  able  to  predict  the  behavior  of  unseen  data  from  the  same  prob¬ 
lem  is  at  the  core  of  any  automatic  classification  process.  However,  this  hypothesis  tends  to  prove  unreliable  when  dealing 
with  biological  (or  other  experimental  sciences)  data,  especially  when  such  data  is  provided  by  more  than  one  laboratory, 
even  if  they  are  following  the  same  protocols  to  obtain  it. 

The  specific  problem  this  paper  attempts  to  solve  is  the  following:  we  have  data  from  one  laboratory  (dataset  A),  and  de¬ 
rive  a  classifier  from  it  that  can  predict  its  category  accurately.  We  are  then  presented  with  data  from  a  second  laboratory 
(dataset  B).  This  second  dataset  is  not  accurately  predicted  by  the  classifier  we  had  previously  built  due  to  a  fracture  between 
the  data  of  both  laboratories.  We  intend  to  find  a  transformation  of  dataset  B  (dataset  S)  where  the  classifier  works. 

Evolutionary  computing,  as  introduced  by  Holland  [27];  is  based  on  the  idea  of  the  survival  of  the  fittest,  evoked  by  the 
natural  evolutionary  process.  In  genetic  algorithms  (GAs)  [21],  solutions  (genes)  are  more  likely  to  reproduce  the  fitter 
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they  are,  and  random  sporadic  mutations  help  maintain  population  diversity.  Genetic  Programming  (GP)  [33]  is  a  devel¬ 
opment  of  those  techniques,  and  follows  a  similar  pattern  to  evolve  tree-shaped  solutions  using  variable-length 
chromosomes. 

Feature  extraction,  as  defined  by  Wyse  et  al.  [56],  ‘consists  of  the  extraction  a  set  of  new  features  from  the  original  fea¬ 
tures  through  some  functional  mapping’.  Our  approach  to  the  problem  can  be  seen  as  feature  extraction,  since  we  build  a 
new  set  of  features  which  are  functions  of  the  old  ones.  However,  we  have  a  different  goal  than  that  of  classical  feature 
extraction,  since  our  intention  is  to  fit  a  dataset  to  an  already  existing  classifier,  not  to  improve  the  performance  of  a  future 
one. 

In  this  work,  we  intend  to  demonstrate  the  use  of  GP-based  feature  extraction  to  unveil  transformations  in  order  to  im¬ 
prove  the  accuracy  of  a  previously  built  classifier,  by  performing  feature  extraction  on  a  dataset  where  said  classifier  should, 
in  principle,  work;  but  where  it  does  not  perform  accurately  enough.  We  test  our  algorithm  first  on  artificially-built  problems 
(where  we  apply  ad  hoc  transformations  to  datasets  from  which  a  classifier  has  been  built,  and  use  the  dataset  resulting  from 
those  transformations  as  our  problem  dataset);  and  then  on  a  real-world  application  where  biological  data  from  two  differ¬ 
ent  medical  laboratories  regarding  prostate  cancer  diagnosis  are  used  as  datasets  A  and  B. 

Even  though  the  method  proposed  in  this  paper  does  not  attempt  to  reduce  the  number  of  features  or  instances  in  the 
dataset,  it  can  still  be  regarded  as  a  form  of  data  reduction  because  it  unifies  the  data  distributions  of  two  datasets;  which 
results  in  the  capability  of  applying  the  same  classifier  to  both  of  them,  instead  of  needing  two  different  classifiers,  one  for 
each  dataset. 

The  remainder  of  this  paper  is  organized  as  follows:  in  Section  2,  some  preliminaries  about  the  techniques  used  and 
some  approaches  to  similar  problems  in  the  literature  are  presented.  Section  3  details  the  real-world  biological  problem 
that  motivates  this  paper.  Section  4  has  a  description  of  the  proposed  algorithm  GP-RFD;  and  Section  5  includes  the 
experimental  setup,  along  with  the  results  obtained,  and  an  analysis.  Finally,  in  Section  6  some  concluding  remarks 
are  made. 

2.  Preliminaries 

This  section  is  divided  in  the  following  way:  in  Subsection  2.1  we  introduce  the  notation  that  has  been  used  in  this  paper. 
Then  we  include  an  introduction  to  GP  in  Subsection  2.2,  a  brief  summary  of  what  has  been  done  in  feature  extraction  in 
Subsection  2.3,  and  a  short  review  of  the  different  approaches  we  found  in  the  specialized  literature  on  the  use  of  GP  for 
feature  extraction  in  Subsection  2.4.  We  conclude  mentioning  some  works  related  to  the  finding  and  repair  of  fractures  be¬ 
tween  data  in  Subsection  2.5. 

2.3.  Notation 

A  classification  problem  is  considered  with: 

•  A  set  of  input  variables  X  =  {x,/i  =  1,. .  .  ,n„},  where  nv  is  the  number  of  features  (attributes)  of  the  problem. 

•  A  set  of  values  for  the  target  variable  (class)  C  =  {Cfj  =  {1, . . .  ,nc}},  where  nc  is  the  number  of  different  values  for  the  class 
variable. 

•  A  set  of  examples  E  =  {eh  =  (e} , . . . ,  ej  ,Ch)/h  =  1 _ ,  ne},  where  d1  is  the  class  label  for  the  sample  eh,  and  ne  is  the  num¬ 

ber  of  examples. 

When  describing  the  problem,  we  mention  datasets  A,  B  and  S.  They  correspond  to: 

•  A:  the  original  dataset  that  was  used  to  build  the  classifier. 

•  B:  the  problem  dataset.  The  classifier  is  not  accurate  on  this  dataset,  and  that  is  what  the  proposed  algorithm  attempts  to 
solve. 

•  S:  the  solution  dataset,  result  of  applying  the  evolved  transformation  to  the  samples  in  dataset  B.  The  goal  is  to  have  the 
classifier  performance  be  as  high  as  possible  on  this  dataset. 

When  performing  experiments  and  obtaining  the  evolved  expressions,  we  use  the  following  notation:  when  artificially 
creating  a  dataset  B  by  means  of  a  fabricated  transformation  over  dataset  A,  we  have  B  =  {b,  /i  =  1 , . . . ,  nv]  be  the  attributes 
in  dataset  B  and  A  =  {a,  /i  =  1,. .  ,,nv}  be  the  ones  from  dataset  A.  In  appendix  A,  we  show  the  learned  transformations  for 
the  prostate  cancer  problem.  The  attributes  shown  are  those  corresponding  to  dataset  S,  and  are  represented  as  S  =  {s,/ 
i=l . riv}. 

2.2.  Genetic  programming 

A  GA  [21]  is  a  stochastic  optimization  technique  inspired  by  nature’s  development  of  useful  characters.  It  is  based  on  the 
idea  of  survival  of  the  fittest  [11]  in  the  following  way:  given  a  population  of  possible  solutions  to  a  problem  (represented  by 
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chromosomes),  there  is  some  selection  procedure  that  favors  the  fitter  ones  (i.e.,  the  ones  that  provide  a  higher-quality  solu¬ 
tion);  and  the  selected  chromosomes  get  an  opportunity  to  pass  down  their  genetic  material  to  the  next  generation  via  some 
crossover  operator;  which  usually  builds  new  individuals  from  the  combination  of  old  ones.  In  some  variations  of  the  algo¬ 
rithm,  random  mutations  are  sporadically  introduced  to  help  maintain  biological  diversity  in  the  population. 

GP,  as  proposed  by  John  Koza  in  1992  [33],  uses  a  selectorecombinative  schema  where  the  solutions  are  represented  by 
trees;  which  are  encoded  as  variable-length  chromosomes.  It  was  originally  designed  to  automatically  develop  programs,  but 
it  has  been  used  for  multiple  purposes  due  to  its  high  expressive  power  and  flexibility.  In  the  words  of  Poli  and  Langdon  [46], 
‘GP  is  a  systematic,  domain-independent  method  for  getting  computers  to  solve  problems  automatically  starting  from  a 
high-level  statement  of  what  needs  to  be  done.  Using  ideas  from  natural  evolution,  GP  starts  from  an  ooze  of  random  com¬ 
puter  programs,  and  progressively  refines  them  through  processes  of  mutation  and  sexual  recombination,  until  solutions 
emerge.  This  is  all  done  without  the  user  having  to  know  or  specify  the  form  or  structure  of  solutions  in  advance.  GP  has 
generated  a  plethora  of  human-competitive  results  and  applications,  including  novel  scientific  discoveries  and  patentable 
inventions’. 

There  are  a  few  details  about  GP  that  make  it  different  from  standard  GAs: 

•  Crossover:  the  most  commonly  used  operator  is  one-point  crossover,  which  is  analogous  to  the  GA  classical  one,  but 
where  subtrees  instead  of  a  specific  gene  signal  where  the  cut  is  made. 

•  Even  though  mutation  was  used  in  the  early  literature  regarding  the  evolution  of  programs  (see  [7,10,16])  Koza  chose  not 
to  use  it  [33,34],  as  he  wished  to  demonstrate  that  mutation  was  not  necessary.  This  has  significantly  influenced  the  field, 
and  mutation  was  often  omitted  from  GP  runs.  However,  mutation  has  proved  useful  since  then  (see  [5,42],  for  example); 
and  its  use  is  widely  spread  nowadays.  Multiple  different  mutation  operators  have  been  proposed  in  the  literature  [44]. 

•  Treatment  of  constants:  the  discovery  of  constants  is  one  of  the  hardest  issues  in  GP.  Koza  proposed  a  solution 
called  Ephemeral  Random  Constant  (ERC),  which  uses  a  fixed  terminal  (e)  to  represent  a  constant.  The  first  time 
one  of  such  constants  is  evaluated,  it  gets  assigned  a  random  value.  From  there  on,  it  retains  that  value  throughout 
the  whole  run.  A  number  of  alternatives  have  been  proposed  in  the  literature  [14,49],  but  ERC  remains  the  most 
used  one. 

•  Automatically  defined  functions:  ADFs  were  also  first  proposed  by  Koza  [34],  The  idea  is  to  permit  each  individual  to 
evolve  more  than  one  tree  simultaneously;  having  the  extra  trees  work  as  primitives  that  can  be  called  from  the  main 
one. 

GP  has  been  applied  often  to  classification  [13].  Among  the  latest  advances  in  the  field,  we  would  like  to  mention  those 
dedicated  to  high  dimensional  problems  [35,6],  variations  in  population  size  [31,32],  and  applications  to  other  related  fields 
[58,3], 

2.3.  Feature  extraction 

Feature  extraction  creates  new  features  as  functional  mappings  of  the  old  ones.  It  has  been  used  both  as  a  form  of  pre¬ 
processing,  which  is  the  approach  we  use  in  this  paper,  and  also  embedded  with  a  learning  process  in  wrapper  techniques. 
An  early  proposer  of  such  a  term  was  probably  Wyse  in  1980,  in  a  paper  about  intrinsic  dimensionality  estimation  [56].  There 
are  multiple  techniques  that  have  been  applied  to  feature  extraction  throughout  the  years,  ranging  from  principal  compo¬ 
nent  analysis  to  support  vector  machines  to  GAs  (see  [28,45,43],  respectively,  for  some  examples). 

Among  the  foundations  papers  in  the  literature,  Liu’s  book  in  1998  [38]  is  one  of  the  earlier  compilations  of  the  field.  As  a 
result  of  a  workshop  held  in  2003  [24],  Guyon  and  Elisseeff  published  a  book  with  an  important  treatment  of  the  foundations 
[25], 

2.4.  Genetic  programming-based  feature  extraction 

GP  has  been  used  extensively  to  optimize  feature  extraction  and  selection  tasks.  One  of  the  first  contributions  in  this  line 
was  the  one  published  by  Tackett  in  1993  [53],  who  applied  GP  to  feature  discovery  and  image  discrimination  tasks. 

We  can  consider  two  main  branches  in  the  philosophy  of  GP-based  feature  extraction: 

On  one  hand,  we  have  the  proposals  that  focus  only  on  the  feature  extraction  procedure,  of  which  there  are  multiple 
examples:  Sherrah  et  al.  [50]  presented  in  1997  the  evolutionary  pre-processor  (EPrep),  which  searches  for  an  optimal  fea¬ 
ture  extractor  by  minimizing  the  misclassification  error  over  three  randomly  selected  classifiers.  Kotani  et  al.’s  work  from 
1999  [30]  determined  the  optimal  polynomial  combinations  of  raw  features  to  pass  to  a  k-nearest  neighbor  classifier.  In 
2001,  Bot  [8]  evolved  transformed  features,  one-at-a-time,  again  for  a  k-NN  classifier,  utilizing  each  new  feature  only  if  it 
improved  the  overall  classification  performance.  Zhang  and  Rockett,  in  2006,  [61]  used  multiobjective  GP  to  learn  optimal 
feature  extraction  in  order  to  fold  the  high-dimensional  pattern  vector  to  a  one-dimensional  decision  space  where  the  clas¬ 
sification  would  be  trivial.  Lastly,  also  in  2006,  Guo  and  Nandi  [23]  optimized  a  modified  Fisher  discriminant  using  GP,  and 
then  Zhang  et  al.  extended  their  work  by  using  a  multiobjective  approach  to  prevent  tree  bloat  [62],  and  applied  a  similar 
method  to  spam  filtering  [60]. 
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On  the  other  hand,  some  authors  have  chosen  to  evolve  a  full  classifier  with  an  embedded  feature  extraction  step.  As  an 
example,  Harris  [26]  proposed  in  1997  a  co-evolutionary  strategy  involving  the  simultaneous  evolution  of  the  feature  extrac¬ 
tion  procedure  along  with  a  classifier.  More  recently,  Smith  and  Bull  [52]  developed  a  hybrid  feature  construction  and  selec¬ 
tion  method  using  GP  together  with  a  GA.  FLGP,  by  Yin  et  al.  [37]  is  yet  another  example,  where  'new  features  extracted  by 
certain  layer  are  used  to  be  the  training  set  of  next  layer’s  populations’. 

2.5.  Finding  and  repairing  fractures  between  data 

Throughout  the  literature  there  have  been  a  number  proposals  to  quantify  the  amount  of  dataset  shift  (in  other  words,  the 
size  of  the  fracture  in  the  data).  This  shift  is  usually  due  to  time  passing  (the  data  comes  from  the  same  source  at  a  latter 
time),  but  it  can  also  be  due  to  the  data  being  originated  by  different  sources,  as  is  the  case  in  this  paper.  Some  of  the  most 
relevant  works  in  the  field  are:  Wang  et  al.  [54],  where  the  authors  present  the  idea  of  correspondence  tracing.  They  propose 
an  algorithm  for  the  discovering  of  changes  of  classification  characteristics,  which  is  based  on  the  comparison  between  two 
rule-based  classifiers,  one  built  from  each  dataset.  Yang  et  al.  [57]  presented  in  2008  the  idea  of  conceptual  equivalence  as  a 
method  for  contrast  mining,  which  consists  of  the  discovery  of  discrepancies  between  datasets.  Lately,  it  is  important  to 
mention  the  work  by  Cieslak  and  Chawla  [9],  which  presents  a  statistical  framework  to  analyze  changes  in  data  distribution 
resulting  in  fractures  between  the  data. 

A  different  approach  to  fixing  data  fractures  relies  on  the  adaptation  of  the  classifier.  Quinonero-Candela  et  al.  [47]  edited 
a  very  interesting  book  on  the  topic,  including  several  specific  proposals  to  repair  fractures  between  data  (what  they  call 
dataset  shift).  There  are  two  main  differences  between  the  usual  proposals  in  the  literature  and  this  contribution:  first,  they 
are  most  often  based  on  altering  the  classifier,  while  we  propose  keeping  it  intact  and  transforming  the  data.  Second,  most 
authors  focus  on  covariate  shift,  a  specific  kind  of  data  fracture,  but  the  method  we  propose  here  is  more  general  and  can 
tackle  any  kind  of  shift. 


3.  Case  study:  prostate  cancer  diagnosis 

This  section  begins  with  an  introduction  to  the  importance  of  the  problem  in  Subsection  3.1.  The  diagnostic  procedure  is 
summarized  in  Subsection  3.2,  and  the  reason  to  apply  GP-RFD  to  this  problem  is  shown  in  Subsection  3.3.  Finally,  the 
preprocessing  the  data  went  through  is  presented  in  Subsection  3.4. 

3.1.  Motivation 

Prostate  cancer  is  the  most  common  non-skin  malignancy  in  the  western  world.  The  American  Cancer  Society  estimated 
192,280  new  cases  and  27,360  deaths  related  to  prostate  cancer  in  2009  [2],  Recognizing  the  public  health  implications  of 
this  disease,  men  are  actively  screened  through  digital  rectal  examinations  and/or  serum  prostate  specific  antigen  (PSA)  level 
testing.  If  these  screening  tests  are  suspicious,  prostate  tissue  is  extracted,  or  biopsied,  from  the  patient  and  examined  for 
structural  alterations.  Due  to  imperfect  screening  technologies  and  repeated  examinations,  it  is  estimated  that  more  than 
one  million  people  undergo  biopsies  in  the  US  alone. 

3.2.  Diagnostic  procedure 

Biopsy,  followed  by  manual  examination  under  a  microscope  is  the  primary  means  to  definitively  diagnose  prostate  can¬ 
cer  as  well  as  most  internal  cancers  in  the  human  body.  Pathologists  are  trained  to  recognize  patterns  of  disease  in  the  archi¬ 
tecture  of  tissue,  local  structural  morphology  and  alterations  in  cell  size  and  shape.  Specific  patterns  of  specific  cell  types 
distinguish  cancerous  and  non-cancerous  tissues.  Hence,  the  primary  task  of  the  pathologist  examining  tissue  for  cancer 
is  to  locate  foci  of  the  cell  of  interest  and  examine  them  for  alterations  indicative  of  disease.  A  detailed  explanation  of  the 
procedure  is  beyond  the  scope  of  this  paper  and  can  be  found  elsewhere  [36,41,40]. 

Operator  fatigue  is  well-documented  and  guidelines  limit  the  workload  and  rate  of  examination  of  samples  by  a  single 
operator  (examination  speed  and  throughput).  Importantly,  inter-  and  intra-pathologist  variation  complicates  decision  mak¬ 
ing.  For  this  reason,  it  would  be  extremely  interesting  to  have  an  accurate  automatic  classifier  to  help  reduce  the  load  on  the 
pathologists.  This  was  partially  achieved  in  [41],  but  some  issues  remain  open. 

3.3.  The  generalization  problem 

Llora  et  al.  [41]  successfully  applied  a  genetics-based  approach  to  the  development  of  a  classifier  that  obtained  human- 
competitive  results  based  on  FTIR  data.  However,  the  classifier  built  from  the  data  obtained  from  one  laboratory  proved 
remarkably  inaccurate  when  applied  to  classify  data  from  a  different  hospital.  Since  all  the  experimental  procedure  was 
identical;  using  the  same  machine,  measuring  and  post-processing;  and  having  the  exact  same  lab  protocols,  both  for  tissue 
extraction  and  staining;  there  was  no  factor  that  could  explain  this  discrepancy. 
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What  we  attempt  to  do  with  this  work  is  develop  an  algorithm  that  can  evolve  a  transformation  over  the  data  from  the 
second  laboratory,  creating  a  new  dataset  where  the  classifier  built  from  the  first  lab  is  as  accurate  as  possible.  This  evolved 
transformation  would  also  provide  valuable  information,  since  it  would  allow  the  scientists  processing  the  tissues  analyze 
the  differences  between  their  results  and  those  of  other  hospitals. 


3.4.  Pre-processing  of  the  data 

The  biological  data  obtained  from  the  laboratories  has  an  enormous  size  (in  the  range  of  14  GB  of  storage  per  sam¬ 
ple);  and  parallel  computing  was  needed  to  achieve  better-than-human  results.  For  this  reason,  feature  selection  was 
performed  on  the  dataset  obtained  by  FT1R.  It  was  done  by  applying  an  evaluation  of  pairwise  error  and  incremental 
increase  in  classification  accuracy  for  every  class,  resulting  in  a  subset  of  93  attributes.  This  reduced  dataset  provided 
enough  information  for  classifier  performance  to  be  rather  satisfactory:  a  simple  C4.5  classifier  achieved  ~95%  accuracy 
on  the  data  from  the  first  lab,  but  only  ~80%  on  the  second  one.  The  dataset  consists  of  789  samples  from  one  labora¬ 
tory  and  665  from  the  other  one.  These  samples  represent  0.01%  of  the  total  data  available  for  each  data  set,  which  were 
selected  applying  stratified  sampling  without  replacement.  A  detailed  description  of  the  data  pre-processing  procedure 
can  be  found  in  [15]. 


4.  A  proposal  for  GP-based  feature  extraction  for  the  repairing  of  fractures  between  data  (GP-RFD) 

This  section  is  presented  in  the  following  way:  first,  a  justification  for  the  choice  of  GP  is  included.  Subsection  4.1  details 
how  the  solutions  are  represented,  then  the  fitness  evaluation  procedure  and  the  genetic  operators  are  introduced  in  Sub¬ 
sections  4.2  and  4.3  respectively.  Then,  the  parameter  choices  are  explained  in  Subsection  4.4,  while  the  function  set  is  in 
Subsection  4.5.  Finally,  the  execution  flow  of  the  whole  procedure  is  shown  in  Subsection  4.6. 

The  problem  we  are  attempting  to  solve  is  the  design  of  a  method  that  can  create  a  transformation  from  a  dataset  (dataset 
B)  where  a  classification  model  is  not  accurate  enough  into  a  new  one  where  it  is  (dataset  S).  Said  classifier  is  kept  unchanged 
throughout  the  process. 

We  decided  to  use  GP  to  solve  the  problem  for  a  number  of  reasons:  first,  it  is  well  suited  to  evolve  arbitrary  expressions 
because  its  chromosomes  are  trees.  This  is  useful  in  our  case  because  we  want  to  have  the  maximum  possible  flexibility  in 
terms  of  the  functional  expressions  that  can  be  present  in  the  feature  extraction  procedure.  Second,  GP  provides  highly-inter- 
pretable  solutions.  This  is  an  advantage  because  our  goal  is  not  only  to  have  a  new  dataset  where  the  classifier  works,  but 
also  to  analyze  what  was  the  problem  in  the  first  dataset. 

The  specific  decisions  to  be  made  once  GP  was  chosen  as  the  technique  to  apply  are  how  to  represent  the  solutions,  what 
terminals  and  operators  to  choose,  how  to  calculate  the  fitness  of  an  individual  and  which  evolutionary  parameters  (popu¬ 
lation  size,  number  of  generations,  selection  and  mutation  rates,  etc.)  are  appropriate  for  each  specific  problem.  To  clarify 
some  of  the  points,  let  us  have  a  binary  2-dimensional  problem  as  an  example,  and  let  us  use  a  function  set  composed  of 

4.1.  Solutions  representation:  context-free  grammar 

The  representation  issue  was  solved  by  extending  GP  to  evolve  more  than  one  tree  per  solution.  Each  individual  is  com¬ 
posed  by  n  trees,  where  n  =  nv,  the  number  of  attributes  present  in  the  dataset  (we  are  trying  to  develop  a  new  dataset  with 
the  same  number  of  attributes  as  the  old  one).  In  the  tree  structure,  the  leaves  are  either  constants  (we  use  the  Ephemeral 
Random  Constant  approach)  or  attributes  from  the  original  dataset.  The  intermediate  nodes  are  functions  from  the  function 
set,  which  is  specific  to  each  problem. 

The  attributes  on  the  transformed  dataset  are  represented  by  algebraic  expressions.  These  expressions  are  generated 
according  to  the  rules  of  a  context-free  grammar  which  allows  the  absence  of  some  of  the  functions  or  terminals.  The  gram¬ 
mar  corresponding  to  the  example  problem  would  look  like  this: 

Start  —>  Tree  Tree 
Tree  — >  Node 

Node  — ►  Node  Operator  Node 
Node  — >  Terminal 
Operator  — >  +|  -  |  *  |h- 
Terminal  — >  x0|xi  |E 
E  — >  realNumber(represented  by  e) 

An  individual  in  the  example  problem  would  have  two  trees;  and  each  of  them  would  be  allowed  to  have  any  of  the  functions 
in  the  function  set,  which  for  this  example  is  {+,-,*,-^},  in  their  intermediate  nodes;  and  any  of  {xo.x^e}  in  the  leaves.  This, 
for  example,  would  be  a  legal  individual: 
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4.2.  Fitness  evaluation 

The  fitness  evaluation  procedure  is  probably  the  most  treated  aspect  of  design  in  the  literature  when  dealing  with 
GP-based  feature  extraction.  As  has  been  stated  before,  the  idea  is  to  have  the  provided  classifier’s  performance  drive  the 
evolution.  To  achieve  that,  GP-RFD  calculates  fitness  in  the  following  way: 

1.  Prerequisite:  a  previously  built  classifier  (the  one  built  from  dataset  A)  needs  to  be  provided.  It  is  used  as  a  black 
box. 

2.  Given  an  individual  composed  of  a  list  of  expression  trees  (one  corresponding  to  each  extracted  attribute),  a  new 
dataset  (dataset  S)  is  built  applying  the  transformations  encoded  in  those  expression  trees  to  all  the  samples  in 
dataset  B. 

3.  The  fitness  of  the  individual  is  the  classifier’s  accuracy  on  dataset  S  (training-set  accuracy),  calculated  as  the  ratio  of 
correctly  classified  samples  over  the  total  number  of  samples. 

Fig.  1  presents  a  schematic  representation  of  the  procedure. 

4.3.  Genetic  operators 

This  section  details  the  choices  made  for  selection,  crossover  and  mutation  operators.  Since  the  objective  of  this  work  is 
not  to  squeeze  the  maximum  possible  performance  from  GP,  but  rather  to  show  that  it  is  an  appropriate  technique  for  the 
problem  and  that  it  can  indeed  solve  it,  we  did  not  pay  special  attention  to  these  choices,  and  picked  the  most  common  ones 
in  the  specialized  literature. 

•  Tournament  selection  without  replacement.  To  perform  this  selection,  k  individuals  are  first  randomly  picked  from  the 
population  (where  k  is  the  tournament  size),  while  avoiding  using  any  member  of  the  population  more  than  once.  The 
selected  individual  is  then  chosen  as  the  one  with  the  best  fitness  among  those  picked  in  the  first  stage. 

•  One-point  crossover:  for  each  dimension,  a  subtree  from  one  of  the  parents  is  substituted  by  one  from  the  other  parent. 
The  procedure  is  specified  in  Algorithm  1.  An  example,  for  one  of  the  dimensions  only,  can  be  seen  in  Fig.  2. 


Fig.  1.  Schematic  representation  of  the  fitness  evaluation  procedure. 
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Fig.  2.  Crossover  example  for  one  of  the  dimensions  only,  this  is  repeated  for  all  dimensions  (trees)  on  each  individual. 


•  Swap  mutation:  this  is  a  conservative  mutation  operator,  that  helps  diversify  the  search  within  a  close  neighborhood  of  a 
given  solution.  It  consists  of  exchanging  the  primitive  associated  to  a  node  by  one  that  has  the  same  number  of 
arguments. 


Algorithm  1.  One-point  crossover  procedure 
FORALL  trees  on  each  individual 

1.  Randomly  select  a  non-root  non-leave  node  on  each  of  the  two  parents. 

2.  The  first  child  is  the  result  of  swapping  the  subtree  below  the  selected  node  in  the  father  for  that  of  the  mother. 

3.  The  second  child  is  the  result  of  swapping  the  subtree  below  the  selected  node  in  the  mother  for  that  of  the  father. 


•  Replacement  mutation:  this  is  a  more  aggressive  mutation  operator  that  leads  to  diversification  in  a  larger  neighborhood. 
The  procedure  to  perform  this  mutation  is  the  following: 

1.  Randomly  select  a  non-root  non-leave  node  on  the  tree  to  mutate. 

2.  Create  a  random  tree  of  depth  no  more  than  a  fixed  maximum  depth.  This  parameter  has  not  been  tinkered  with,  since 
the  goal  of  this  study  was  not  to  squeeze  the  maximum  performance  out  of  the  proposed  method,  but  rather  to  check 
its  viability.  Future  work  could  tackle  this  issue. 

3.  Swap  the  subtree  below  the  selected  node  for  the  randomly  generated  one. 

4.4.  Parameters 

The  evolutionary  parameters  that  were  used  for  the  experimental  study  are  detailed  in  Table  1 .  As  it  was  mentioned  be¬ 
fore,  not  much  attention  was  payed  to  optimizing  the  parameters.  Because  of  this  the  crossover  and  mutation  probabilities, 
along  with  the  number  of  generations  to  run,  were  fixed  to  the  usual  values  in  the  literature  (we  could  call  them  ‘default 
values’)  and  were  not  changed  in  any  of  the  experiments. 

Some  of  the  evolutionary  parameters  are  problem  dependent,  to  select  an  appropriate  value  for  them  we  used  the  follow¬ 
ing  rules: 

•  Population  size:  since  the  only  measure  of  difficulty  we  know  about  each  of  our  problems  a  priori  is  the  number  of  attri¬ 
butes  present  in  the  dataset  (n„),  we  have  to  fix  the  population  size  as  a  function  of  it.  In  the  experiments  carried  out  in 
this  study,  we  found  400 *nv  to  be  a  large  enough  population  to  achieve  satisfactory  results.  This  parameter  is  problem- 
dependent,  so  what  we  are  fixing  here  is  an  upper  bound  for  the  population  size  needed.  We  found  that,  by  following  this 


Table  1 

Evolutionary  parameters  for  a  rvdimensional  problem. 


Parameter 

Value 

Number  of  trees 

nv 

Population  size 

400 *nv 

Duration  of  the  run 

50  generations 

Selection  operator 

Tournament  without  replacement 

Tournament  size 

to&(n„)  + 1 

Crossover  operator 

One-point  crossover 

Crossover  probability 

0.9 

Mutation  operator 

Replacement  &  Swap  mutations 

Replacement  mutation  probability 

0.001 

Swap  mutation  probability 

0.01 

Maximum  depth  of  the  swapped  in  subtree 

5 

Function  set 

Problem  dependent 

Terminal  set 

{x0,Xi,...,x„„  -  l,e) 
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rule,  GP-RFD  consistently  achieved  good  results;  being  able  to  solve  the  harder  transformations,  even  though  it  was 
excessive  for  the  easier  ones  and  thus  resulted  in  slower  execution  times.  If  harder  problems  than  the  ones  studied  in  this 
paper  were  to  be  tackled,  this  parameter  might  need  to  be  revised. 

•  Tournament  size:  since  we  are  increasing  the  population  size  as  a  function  of  n„  an  increase  of  the  selection  pressure  is 
needed  too.  The  formula  we  used  to  calculate  tournament  size  is:  log2{nv)  +  1.  Again,  this  empirical  estimation  produced 
the  best  results;  while  an  excessive  pressure  produced  too  fast  of  a  convergence  into  local  optima,  and  not  enough 
pressure  prevents  GP-RFD  from  converging  at  all. 


Table  2 

Datasets  used. 


Dataset 

Attributes 

Samples 

Classes 

Class  distribution 

Attr.  type 

Linear  synthetic 

2 

1000 

2 

50-50% 

Real 

Tao 

2 

1888 

2 

50-50% 

Real 

Iris 

4 

150 

3 

33-33-33% 

Real 

Phoneme 

5 

5404 

2 

70-30% 

Real 

Wisconsin 

9 

683 

2 

65-35% 

Real 

Heart 

13 

270 

2 

55-45% 

Real 

Wine 

13 

178 

3 

33-39%-27% 

Real 

Wdbc 

30 

569 

2 

65-45% 

Real 

Ionosphere 

34 

351 

2 

65-45% 

Real 

Sonar 

60 

208 

2 

54-46% 

Real 

Mux-1 1 

11 

2048 

2 

50-50% 

Nominal 

Cancer (A) 

93 

789 

2 

60-40% 

Real 

Cancer (B) 

93 

665 

2 

60-40% 

Real 

Table  3 


Transformations  performed  on  the  Tao  dataset. 

Experiment 

Rotation 

Translate  8?  extrude 

Transformation  applied 

b0  =  a0*cos{  1)  +  a1*si'n(l) 
b\  =  a0*sin{  1 )  +  CL\  *cos(  1 ) 

b0  =  Qq*3  +  2 

Table  4 

Transformations  performed  on  the  UC1  and  ELENA  datasets. 


Dataset 

In-set  transformation 

Out-of-set  transformation 

Iris 

b2  =  ci2  +  a2 

b3  =  e“3 

Phoneme 

b 0  =  a0-  0.4 

b0  =  sin(a0) 

b3  =  a3*  2.5 

b3  =  cos(a3) 

Wisconsin 

b,  =  a,  +2 

bt  =  cosiaj) 

^5  =  ^5*3 

b5  =  sin(a5 ) 

Heart 

b2  =  a2*  2 

b2  =  sin(a2 ) 

fa,  T  =a,l  +3 

b,  1  =  e“>’ 

Wine 

bg  =  Qg  —  1 

bg  =  sin(a9 ) 

b,2  =  a{2*2 

b]2  =  cos(a^2) 

Wdbc 

b2 6  =  a26  -  1 

b2  6  =  sin(a2  6) 

b2l  =  a27*3 

b27  =  cos(a2  7) 

Ionosphere 

b4  =  a4-  0.5 

b4  =  ea‘ 

by  =  Q 7*2 

by  =  sin(ciy) 

Sonar 

b7  =  a7  +  0.3 

by  =  sin{a7 ) 

b43  = a4 3*2 

b43  =  e-3 

Table  5 

Transformations  performed  on  the  Multiplexer-11  dataset. 


Experiment 

Bit  flip 

Column  swap 

Transformation  applied 

bi  =  not(a  i) 

bi  =  a2 

b2  =  a3 

b3  =  a , 
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Table  6 

Experimental  parameters. 


Dataset 

Population  size 

Tournament  size 

Function  set 

Linear  synthetic 

800 

2 

{+,  — 

-i 

Tao 

800 

2 

-i 

Iris 

1600 

3 

-i 

Phoneme 

2000 

3 

-i 

Wisconsin 

3600 

4 

-i 

Heart 

5200 

4 

{+,  — 

-i 

Wine 

5200 

4 

{+,  — 

-i 

Wdbc 

12,000 

5 

{+,  — 

-i 

Ionosphere 

13,600 

6 

-i 

Sonar 

24,000 

6 

-i 

Mux- 11 

4400 

4 

-i 

Cancer 

37,200 

6 

{+,  — 

-,  exp,  cos} 

4.5.  Function  set 

Which  functions  to  include  in  the  function  set  are  usually  dependent  on  the  problem. , , ,  Since  one  of  our  goals  is  to  have 
an  algorithm  as  universal  and  robust  as  possible,  where  the  user  does  not  need  to  fine-tune  any  parameters  to  achieve  good 
performance;  we  decided  not  to  study  the  effect  of  different  function  set  choices.  The  used  function  sets  are  chosen  to  be 
close  to  the  default  ones  most  authors  use  in  the  literature,  and  were  extracted  in  all  cases  from  {+,-,*,4-, exp,  cos}.  The 
benchmark  experiments  did  not  use  [exp,  cos],  since  we  intended  to  test  the  capability  of  the  method  to  unveil  transforma¬ 
tions  that  did  not  include  functions  in  the  function  set.  The  specific  choices  for  each  of  the  experiments  can  be  seen  in 
Table  6. 

4.6.  Execution  flow 

Algorithm  2  contains  a  summary  of  the  execution  flow  of  the  GP  procedure,  which  follows  a  classical  evolutionary 
scheme.  It  stops  after  a  user-defined  number  of  generations,  providing  as  a  result  the  best  individual  (i.e.,  transformation) 
it  has  ever  found. 


Algorithm  2.  Execution  flow  of  the  GP  procedure 

1.  Randomly  create  the  initial  population  by  applying  the  context-free  grammar  presented  in  Subsection  4.1. 

2.  Repeat  Ng  times  (where  Ng  is  the  number  of  generations) 

2.1  Evaluate  the  current  population,  using  the  procedure  shown  in  Subsection  4.2. 

2.2  Apply  selection  and  crossover  to  create  a  new  population  that  will  replace  the  old  one. 

2.3  Apply  the  mutation  operators  to  the  new  population. 

3.  Return  the  best  individual  ever  seen. 


5.  Experimental  study 

This  section  is  organized  in  the  following  way:  to  begin  with,  a  general  description  of  the  experimental  procedure  is  pre¬ 
sented  in  Subsection  5.1,  along  with  the  datasets  that  we  have  used  for  our  testing  (both  the  benchmark  problems  and  the 
prostate  cancer  dataset):  and  also  in  the  benchmarks’  case  the  transformations  performed  on  each  of  them.  The  parameters 
used  for  each  experiment  are  shown  in  Subsection  5.2;  followed  by  a  presentation  of  the  benchmark  experimental  results  in 
Subsection  5.3.  Finally,  the  results  obtained  on  the  prostate  cancer  problem  are  presented  in  Subsection  5.4. 

5.3.  Experimental  framework,  datasets  and  transformations 

The  goal  of  the  experiments  was  to  check  how  effective  GP-RFD  was  in  finding  a  transformation  over  dataset  B  that  would 
increase  the  provided  classifier’s  accuracy.  To  validate  our  results,  we  employed  a  5-fold  cross  validation  technique  [29].  We 
used  the  beagle  library  [17]  for  our  GP  implementation. 

The  experimental  study  is  fractioned  in  two  parts.  In  the  first  one,  a  synthetic  set  of  tests  is  built  from  a  few  well-known 
benchmark  datasets.  The  procedure  followed  in  these  experiments  was  (see  Fig.  3  for  a  schematic  representation): 

1.  Split  the  original  dataset  in  two  halves  with  equal  class  distribution. 

2.  Consider  the  first  half,  to  be  dataset  A. 

3.  From  dataset  A,  build  a  classifier.  We  chose  C4.5  [48],  but  any  other  classifier  would  work  exactly  the  same;  due  to  the 
fact  that  GP-RFD  uses  the  learned  classifier  as  a  black  box. 
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Fig.  3.  Schematic  representation  of  the  experimental  procedure  with  benchmark  datasets. 


4.  Apply  a  transformation  over  the  second  half  of  the  original  dataset,  creating  dataset  B.  The  transformations  we  tested 
were  designed  to  check  GP-RFD’s  performance  on  different  types  of  problems,  including  both  linear  and  non-linear  trans¬ 
formations.  A  description  of  each  of  them  can  be  found  in  the  next  subsection. 

5.  The  performance  of  the  classifier  built  in  step  2  is  significantly  worse  on  dataset  B  than  it  is  on  dataset  A.  This  is  the  start¬ 
ing  point  on  the  real  problem  we  are  emulating. 

6.  Apply  GP-RFD  to  dataset  B  in  order  to  evolve  a  transformation  that  will  create  a  solution  dataset  S.  Use  5-fold  cross  val¬ 
idation  over  dataset  S,  so  that  training  and  test  set  accuracy  results  can  be  obtained. 

7.  Check  the  performance  of  the  step  2  classifier  on  dataset  S.  Ideally,  it  should  be  close  to  the  one  on  dataset  A,  which  would 
mean  GP-RFD  has  successfully  discovered  the  hidden  transformation  and  inverted. 

The  second  part  of  the  study  is  the  application  of  the  proposed  algorithm  to  the  prostate  cancer  problem.  The  steps  fol¬ 
lowed  in  this  case  were: 

1.  Consider  each  of  the  provided  datasets  to  be  datasets  A  and  B  respectively. 

2.  From  dataset  A,  build  a  classifier.  Use  5-fold  cross  validation  to  obtain  training  and  test-set  performance  results. 

3.  Apply  GP-RFD  to  dataset  B  in  order  to  evolve  a  transformation  that  will  create  a  solution  dataset  S.  Use  5-fold  cross 
validation  over  dataset  S,  so  that  training  and  test  set  accuracy  results  can  be  obtained. 

4.  Check  the  performance  of  the  step  2  classifier  on  dataset  S.  Ideally,  it  should  be  close  to  the  one  on  dataset  A,  meaning 
GP-RFD  has  successfully  discovered  the  hidden  transformation  and  inverted  it. 

The  selected  datasets  are  summarized  in  Table  2.  A  short  description  and  motivation  for  each  of  the  datasets  follows,  and 
this  subsection  is  concluded  with  the  specification  of  the  transformations  that  were  fabricated  to  test  the  algorithm  on  each 
of  the  benchmark  datasets.  For  the  two-dimensional  problems,  the  transformations  are  also  graphically  represented. 

Note  that  the  transformations  in  the  prostate  cancer  problem  are  not  specified.  This  is  due  to  it  being  a  real-world  prob¬ 
lem  and  not  a  fabricated  one,  so  the  implicit  transformations  in  the  data  were  unknown  a  priori. 

•  Linear  synthetic  dataset:  we  have  called  the  first  dataset  ‘Linear  synthetic’.  It  was  created  specifically  for  this  work,  with 
the  idea  of  having  an  easily  representable  linearly  separable  dataset  to  work  with.  It  was  chosen  to  check  the  performance 
of  GP-RFD  on  some  simple  transformations,  without  the  added  difficulty  of  having  a  complex  original  dataset.  The  dataset 
can  be  seen  in  Fig.  4.  We  applied  three  transformations  to  this  dataset  A:  rotation,  translation  and  extrusion  and  circle. 
The  transformed  datasets  (datasets  B  on  the  experiments)  can  be  seen  in  Figs.  5-7  respectively. 


Fig.  4.  Linear  synthetic  dataset,  dataset  A. 


Please  cite  this  article  in  press  as:  J.G.  Moreno-Torres  et  al..  Repairing  fractures  between  data  using  genetic  programming-based  feature 
extraction:  A  case  study  in  cancer  diagnosis,  Inform.  Sci.  (2010),  doi:10.1016/j.ins.2010.09.018 


ARTICLE  IN  PRESS 


J.G.  Moreno-Torres  et  al.  / Information  Sciences  xxx  (2010)  xxx-xxx 


11 


Fig.  5.  Rotation  problem,  transformed  dataset. 


Fig.  6.  Translation  &  extrusion  problem,  transformed  dataset. 


-1  -0.5  0  0.5  1 

Fig.  7.  Circle  problem,  transformed  dataset. 
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Fig.  8.  Tao  dataset.  This  is  dataset  A,  over  which  the  different  transformations  are  applied,  and  the  transformed  datasets  have  to  fit  to  the  same  classifier  this 
dataset  does. 
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Fig.  9.  Rotated  Tao,  transformed  dataset. 


qOQOpyOuuuiipiHJuu^uiitJuiJUHuuiiuogiiou  .. 
i)  titjin  ti  ii  it  mi  ii  n  it  ft  A  i'i  »i  ft  it  It  it  n  •)  It  mi  i!  h  ii  »  ii  )■  ii  ti  n  ii  ii  ii .. 

innnnnnnnuuL.\i.:^ . . 

A  A  AAA  A  A  A  A  A  A  A  A  A  A  A  A  A  A  AA  a  A  A ...  a  *.  .  A ..  A  A  •* 

ititltiii  iiiuinlHtw** 


Fig.  10.  Translated  and  extruded  Tao,  transformed  dataset. 


•  Tao:  the  next  step  to  check  the  usefulness  of  GP-RFD  is  starting  from  a  harder  dataset.  To  this  end,  we  chose  the  Tao  data¬ 
set,  still  a  2-dimensional  problem  but  where  classification  is  much  harder.  This  dataset  is  also  built  artificially  [39],  The 
dataset  can  be  seen,  before  any  transformations  (dataset  A),  in  Fig.  8.  Mirroring  the  transformations  applied  over  the  lin¬ 
ear  synthetic  dataset,  we  chose  to  transform  the  original  Tao  dataset  by  rotating  it  (Fig.  9);  or  by  translating  and  extruding 
(Fig.  10).  The  transformations  applied  to  Tao  can  also  be  seen  in  Table  3. 

•  UCI  and  ELENA  datasets:  once  GP-RFD  has  been  tested  in  small  (with  a  low  number  of  attributes)  datasets,  it  is  useful  to 
see  how  it  fares  in  bigger  benchmark  problems.  We  chose  a  few  different  datasets  from  the  UCI  database  [4],  as  well  as  the 
ELENA  project  [22]: 

-  Iris:  classification  of  iris  plants  (UCI). 

-  Phoneme:  distinguish  between  nasal  and  oral  sounds  (ELENA). 

-  Wisconsin:  diagnosis  of  breast  cancer  patients  (UCI). 

-  Heart:  detect  the  absence  or  presence  of  heart  disease  (UCI). 

-  Wine:  classification  of  different  types  of  Italian  wines  (UCI). 

-  Wdbc:  determination  of  whether  a  found  tumor  is  benign  or  malignant  (UCI). 

-  Ionosphere:  radar  data  where  the  task  is  to  decide  is  a  given  radar  return  is  good  or  bad  (UCI,  modified  as  found  in  the 
KEEL  database  [1]). 

-  Sonar:  distinguishing  between  rocks  and  metal  cylinders  from  sonar  data  (UCI). 

We  performed  two  different  experiments  on  each  of  the  datasets.  In  the  first  experiment,  the  transformation  is  created 
using  functions  that  appear  in  the  function  set  of  the  GP  procedure  (more  specifically,  one  of  the  attributes  is  added  to  itself). 
We  named  this  experiment  ‘in-set  transformation’.  The  second  one  transforms  the  dataset  by  using  functions  that  do  not 
appear  in  the  GP  function  set.  The  name  for  this  experiment  is  ‘out-of-set  transformation’.  The  exact  details  for  these  trans¬ 
formations  can  be  found  in  Table  4.  Any  attribute  not  specified  as  being  part  of  the  transformation  in  the  tables  is  assumed  to 
be  unchanged. 

•  Multiplexer-1 1 :  since  GP-RFD  should  be  flexible  enough  to  be  able  to  tackle  datasets  with  nominal  attributes,  one  of  these 
datasets  was  included  in  the  testing.  In  this  work,  we  chose  the  Multiplexer  problem.  This  is  a  binary  problem  where 
some  of  the  bits  act  as  address,  and  the  remaining  bits  are  data  registers.  The  correct  classification  for  a  given  input  is 
the  value  of  the  register  pointed  by  the  address  bits.  The  specific  instance  used  here  is  Multiplexer-11,  a  dataset  with 
11  binary  attributes  (where  the  first  three  act  as  address,  and  the  remaining  eight  as  registers);  and  211  =  2048  samples. 
Two  different  transformations  were  tested:  in  the  first  one,  of  the  address  bits  was  flipped;  while  in  the  second  experi¬ 
ment  there  was  an  attribute  swap,  in  a  circular  shift.  The  details  can  be  found  in  Table  5. 

•  Prostate  cancer:  as  was  explained  in  Section  3,  the  solution  to  this  problem  is  the  main  motivation  for  this  work.  Since  we 
were  provided  with  data  from  two  real  laboratories,  there  was  no  need  to  fabricate  any  transformations:  we  chose  one  the 
data  from  one  of  the  laboratories  as  dataset  A  and  the  other  one  as  dataset  B. 
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5.2.  Parameters 

In  this  section,  we  detail  the  parameters  used  for  each  of  the  datasets,  including  both  the  evolutionary  parameters  and  the 
GP  setup.  The  parameters  were  chosen  following  the  rules  detailed  in  Section  4.4. 

As  can  be  seen  in  Table  6,  the  population  sizes  are  large.  This  is  mostly  due  to  GP  being  a  technique  that  traditionally 
requires  large  population  sizes  to  be  effective,  a  factor  which  is  aggravated  by  the  fact  that  GP-RFD  evolves  multiple  expres¬ 
sion  trees  simultaneously  (one  for  each  attribute  in  the  dataset).  We  acknowledge  this  issue  provokes  long  execution  times 
for  some  of  the  experiments,  but  considered  it  a  secondary  concern  and  did  not  address  it  in  this  work. 

5.3.  Experimental  results:  benchmark  problems 

This  part  presents  the  results  obtained  in  terms  of  classifier  performance  for  the  benchmark  problems,  along  with  a  sta¬ 
tistical  analysis  to  evaluate  whether  GP-RFD  is  effective. 

Table  7  details  the  performance  obtained  by  the  C4.5  classifier  on  each  of  the  benchmark  problems.  It  includes  the  clas¬ 
sifier  performance,  calculated  as  shown  on  Subsection  4.2,  on: 

•  Dataset  A,  which  was  used  to  generate  the  decision  tree.  A  5-fold  cross  validation  technique  was  applied,  and  both  train¬ 
ing  and  test  set  results  are  presented. 

•  Dataset  B,  which  was  created  by  designing  an  ad  hoc  transformation. 

•  Dataset  S,  which  is  the  result  of  applying  GP-RFD  to  dataset  B,  obtaining  a  transformed  dataset  where  classifier  perfor¬ 
mance  is  increased.  A  5-fold  cross  validation  technique  was  applied,  and  both  training  and  test  set  results  are  presented. 

The  results  show  that  GP-RFD  is  capable  of  reversing  nearly  all  of  the  fabricated  transformations,  achieving  accuracy  rates 
that  are  very  close  to  the  ones  obtained  in  the  original  datasets  in  both  training  and  test  performances.  GP-RFD  has  also  pro¬ 
ven  capable  of  generalizing  well,  as  can  be  seen  by  the  small  difference  between  training  and  test  set  classification  perfor¬ 
mances  in  most  cases.  However,  some  of  the  datasets  (which,  coincidentally,  tend  to  also  behave  badly  in  terms  of 
generalization  when  building  classifiers)  present  some  generalization  issues,  leading  to  the  inability  to  fully  solve  the 
problem  dataset. 

5.3.1.  Statistical  analysis 

To  complete  the  experimental  study,  we  have  performed  a  statistical  comparison  between  the  classifier  performance  over 
the  following  datasets: 

•  Dataset  A,  from  which  the  classifier  was  built. 

•  Dataset  B,  artificially  built  by  injecting  an  ad  hoc  transformation. 


Table  7 

Classifier  performance  results:  benchmark  problems. 


Problem 

Classifier  performance  on  dataset . . . 

A-training 

A-test 

B 

S-training 

S-test 

Linear  synthetic  -  rotation 

1.00000 

1.00000 

0.24930 

1.00000 

1.00000 

Linear  synthetic  -  translations?  extrusion 

1.00000 

1.00000 

0.34160 

1.00000 

0.99800 

Linear  synthetic  -  circle 

1.00000 

1.00000 

0.49860 

0.96050 

0.94400 

Tao  -  rotation 

0.98518 

0.93750 

0.62924 

0.94418 

0.94255 

Tao  -  translations?  extrusion 

0.98518 

0.93750 

0.80403 

0.95344 

0.93192 

Iris  -  in-set  functions 

0.97330 

0.93333 

0.66667 

0.99333 

0.92000 

Iris  -  out-of-set  functions 

0.97330 

0.93333 

0.60000 

0.99000 

0.92000 

Phoneme  -  in-set  functions 

0.91895 

0.84160 

0.75204 

0.828978 

0.769907 

Phoneme  -  out-of-set  functions 

0.91895 

0.84160 

0.59141 

0.839871 

0.804815 

Wisconsin  -  in-set  functions 

0.97361 

0.93842 

0.35380 

0.98248 

0.93821 

Wisconsin  -  out-of-set  functions 

0.97361 

0.93842 

0.88889 

0.98321 

0.94412 

Heart  -  in-set  functions 

0.89630 

0.72593 

0.45296 

0.92778 

0.79259 

Heart  -  out-of-set  functions 

0.89630 

0.72593 

0.60000 

0.96296 

0.72594 

Wine  -  in-set  functions 

0.97727 

0.89733 

0.65556 

0.98889 

0.90000 

Wine  -  out-of-set  functions 

0.97727 

0.89733 

0.40000 

0.96944 

0.91111 

Wdbc  -  in-set  functions 

0.98571 

0.92143 

0.57143 

0.98839 

0.946428 

Wdbc  -  out-of-set  functions 

0.98571 

0.92143 

0.82857 

0.98214 

0.97500 

Ionosphere  -  in-set  functions 

0.98286 

0.87429 

0.70857 

0.98571 

0.88571 

Ionosphere  -  out-of-set  functions 

0.98286 

0.87429 

0.77714 

0.98571 

0.857143 

Sonar  -  in-set  functions 

0.93939 

0.60601 

0.61000 

0.95500 

0.66000 

Sonar  -  out-of-set  functions 

0.93939 

0.60601 

0.51000 

0.94750 

0.72000 

Muxl  1  -  bit  flip 

1.00000 

0.97070 

0.50000 

0.96951 

0.96667 

Muxll  -  column  swap 

1.00000 

0.97070 

0.62500 

0.97195 

0.96765 
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•  Dataset  S-test,  the  result  of  applying  GP-RFD  over  dataset  B  (test-set  results). 

In  [12,18-20]  a  set  of  simple,  safe  and  robust  non-parametric  tests  for  statistical  comparisons  of  classifiers  are  recom¬ 
mended.  One  of  them  is  the  Wilcoxon  signed-ranks  test  [55,51  ],  which  is  the  test  that  we  have  selected  to  do  the  comparison. 

This  is  analogous  to  the  paired  t-test  in  non-parametric  statistical  procedures;  therefore  it  is  a  pairwise  test  that  aims  to 
detect  significant  differences  between  two  sample  means,  that  is,  the  behavior  of  two  algorithms.  It  is  defined  as  follows:  let 
d,  be  the  difference  between  the  performance  scores  of  the  two  classifiers  on  the  ith  dataset  out  of  Nds  datasets.  The  differ¬ 
ences  are  ranked  according  to  their  absolute  values;  average  ranks  are  assigned  in  the  case  of  ties.  Let  R+  be  the  sum  of  ranks 
for  the  data-sets  in  which  the  first  algorithm  outperformed  the  second,  and  R~  the  sum  of  ranks  for  the  opposite.  Ranks  of 
dj  =  0  are  split  evenly  among  the  sums;  if  there  is  an  odd  number  of  them,  one  is  ignored: 

R+  =  ank(di)  +  ]-  ^  rank(d,) 

dj> 0  Z  d,=  0 

IT  =  rank(di)  +\^  rank(di)  (1) 

dj<0  Z  d,=0 

Let  Tbe  the  smaller  of  the  sums,  T  =  min(R+,R ~).  If  T  is  less  than  or  equal  to  the  value  of  the  distribution  of  Wilcoxon  for  Nds 
degrees  of  freedom  [59],  the  null  hypothesis  of  equality  of  means  is  rejected;  this  will  mean  that  a  given  classifier  outper¬ 
forms  their  opposite,  with  the  p-value  associated. 

The  Wilcoxon  signed-ranks  test  is  more  sensitive  than  the  t-test.  It  assumes  commensurability  of  differences,  but  only 
qualitatively:  greater  differences  still  count  for  more,  which  is  probably  desired,  but  the  absolute  magnitudes  are  ignored. 
From  a  statistical  point  of  view,  the  test  is  safer  since  it  does  not  assume  normal  distributions.  Also,  outliers  (extremely 
good/bad  performances)  have  a  smaller  effect  on  the  Wilcoxon  signed-ranks  test  than  on  the  t-test. 

When  the  assumptions  of  the  paired  t-test  are  met,  the  Wilcoxon  signed-ranks  test  is  less  powerful  than  the  paired  t-test. 
On  the  other  hand,  when  the  assumptions  are  not  met,  the  Wilcoxon  test  is  a  better  choice  than  the  t-test.  This  is  because  the 
Wilcoxon  test  can  be  applied  over  the  averaged  results  obtained  by  the  algorithms  in  each  data  set,  without  any  assumptions 
about  the  characteristics  of  the  distribution  of  the  results  obtained. 

A  complete  description  of  the  Wilcoxon  signed  ranks  test  and  other  non-parametric  tests  for  pairwise  and  multiple  com¬ 
parisons,  together  with  software  for  their  use,  can  be  found  in  the  website  available  at  http://sci2s.ugr.es/sicidm/. 

As  it  was  mentioned  above,  the  test  was  applied  to  compare  the  classifier  performance  in  datasets  A,  B  and  S.  The  results 
can  be  seen  in  Table  8.  Note  that  we  compare  the  results  in  dataset  A  against  those  in  S  both  in  terms  of  training  and  test  sets. 
However,  since  the  classifier  was  not  built  from  dataset  B,  we  consider  those  results  test-set  related  and  compare  it  with 
S-test. 

So  we  can  conclude  GP-RFD  is  capable  of  finding  transformations  resulting  in  a  new  dataset  S  that 

1.  Significantly  outperforms  dataset  B  in  terms  of  classifier  performance. 

2.  Obtains  statistically  equivalent  results  to  dataset  A,  both  in  terms  of  training  and  test  sets.  Since  the  classifier  was  built 
from  dataset  A,  this  means  dataset  S  is  a  successful  repair  of  the  fracture  between  datasets  A  and  B,  assuming  class 


Table  8 

Wilcoxon  signed-ranks  test  results:  Benchmark  problems. 


Comparison 

R+ 

R~ 

p-Value 

Null  hypothesis  of  equality 

A-test  vs  B 

275 

1 

4.77E-007 

rejected  (A-test  outperforms  B) 

B  vs  S-test 

0 

276 

2.38E-007 

rejected  (S-test  outperforms  B) 

A-training  vs  S-training 

147.5 

128.5 

- 

accepted 

A-test  vs  S-test 

128.5 

147.5 

- 

accepted 

Fig.  11.  Linear  synthetic  rotation,  problem  (L)  and  solution  (R)  datasets. 
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Fig.  12.  Linear  synthetic  translation  and  extrusion,  problem  (L)  and  solution  (R)  datasets. 


Fig.  14.  Rotation  in  Tao,  problem  (L)  and  solution  (R)  datasets. 
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Fig.  15.  Translation  and  extrusion  in  Tao,  problem  (L)  and  solution  (R)  datasets. 
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distribution  did  not  change.  We  know  this  is  the  case  in  these  experiments  due  to  the  way  we  built  datasets  A  and  B,  but  it 

has  to  be  kept  in  mind  when  applying  the  method  in  other  environments. 

5.3.2.  Graphical  results 

This  section  presents  graphical  representations  of  some  of  the  obtained  results.  Since  several  of  the  datasets  have  a  high 
number  of  variables  that  make  them  extremely  hard  to  chart  in  a  simple  way,  only  the  results  corresponding  to  the  linear 
synthetic  dataset  (Figs.  11-13)  and  the  Tao  dataset  (Figs.  14  and  15)  are  shown.  To  make  the  visualization  easier,  each  of  the 
solution  datasets  (datasets  S)  is  presented  side-by-side  with  the  corresponding  problem  dataset  (datasets  B).  The  original 
datasets  (datasets  A)  can  be  seen  in  Fig.  4  for  the  linear  synthetic  dataset  and  Fig.  8  for  the  Tao  dataset. 

5.4.  Prostate  cancer  experimental  results 

This  section  presents  the  preliminary  results  for  the  Prostate  Cancer  problem,  in  terms  of  classifier  accuracy.  The  results 
obtained  can  be  seen  in  Table  9.  In  that  table,  dataset  A  is  the  one  from  the  first  lab;  which  was  used  to  build  the  classifier, 
dataset  B  is  the  one  coming  from  the  second  lab,  and  dataset  S  is  the  result  of  the  application  of  CP-RFD. 

To  check  whether  the  full  dataset  B  was  needed  to  evolve  an  effective  transformation,  we  also  tested  using  just  half  of  it  to 
train  GP-RFD,  and  the  other  half  to  test  (2-fold  cross  validation).  These  results  are  also  included  in  Table  9. 

The  performance  results  are  excellent  for  a  number  of  reasons.  First  and  foremost,  GP-RFD  was  able  to  find  a  transforma¬ 
tion  over  the  data  from  the  second  laboratory  that  made  the  classifier  work  just  as  well  as  it  did  on  the  data  from  the  first  lab, 
effectively  finding  the  hidden  perturbations  that  prevented  the  classifier  from  working  accurately. 

The  second  positive  conclusion  to  be  obtained  from  the  results  is  the  generalization  power  of  GP-RFD.  As  can  be  observed 
from  the  test  results,  GP-RFD  does  not  ‘cheat’  by  over-learning  on  the  known  data,  and  works  well  when  transforming  new, 
previously  unseen,  samples. 

Third,  the  results  show  GP-RFD  was  capable  of  obtaining  excellent  results  using  just  half  of  the  B  dataset  to  train.  This 
result  highlights  the  power  of  the  method  to  unveil  the  hidden  transformation  from  a  relatively  low  number  of  samples. 

We  also  performed  a  Wilcoxon  signed-ranks  test  to  evaluate  the  performance  of  GP-RFD  over  the  case  of  study  problem. 
In  order  to  do  it,  we  used  the  results  from  each  partition  in  the  5-fold  cross  validation  procedure.  We  ran  the  experiment  four 
times,  resulting  in  4  *  5  =  20  performance  samples  to  carry  out  the  statistical  test.  As  we  did  before,  R+  corresponds  to  the 
first  algorithm  in  the  comparison  winning,  and  R~  to  the  second  one.  Table  10  shows  the  results. 

The  results  on  the  case  study  problem  are  exactly  the  same  as  those  achieved  in  the  benchmark  problems.  We  can  then 
conclude  GP-RFD  was  capable  of  repairing  the  existing  fracture  between  the  data  from  both  laboratories.  Again,  this  conclu¬ 
sion  assumes  class  distribution  did  not  change.  It  is  a  given  in  this  case,  since  we  know  the  class  distribution  to  be  equal  in 
datasets  A  and  B,  but  is  an  issue  that  has  to  be  kept  in  mind  when  applying  the  method  to  other  problems. 

6.  Concluding  remarks 

We  have  presented  GP-RFD,  a  new  algorithm  that  approaches  a  common  problem  in  real  life  for  which  not  many  solu¬ 
tions  have  been  proposed  in  evolutionary  computing.  The  problem  in  question  is  the  repairing  of  fractures  between  data 
by  adjusting  the  data  itself,  not  the  classifiers  built  from  it. 

We  have  developed  a  solution  to  the  problem  by  means  of  a  GP-based  algorithm  that  performs  feature  extraction  on  the 
problem  dataset  driven  by  the  accuracy  of  the  previously  built  classifier. 


Table  9 

Classifier  performance  results:  the  prostate  cancer  problem. 


Validation  method 

Classifier  performance  in  dataset . . . 

A-training 

A-test 

B 

S-training 

S-test 

5-fold  cross  validation 

0.95435 

0.92015 

0.83570 

0.95191 

0.92866 

2-fold  cross  validation 

0.95435 

0.92015 

0.83570 

0.95482 

0.93223 

Table  10 

Wilcoxon  signed-ranks  test  results:  the  prostate  cancer  problem. 


Comparison 

R+ 

R- 

p-Value 

Null  hypothesis  of  equality 

A-test  vs  B 

210 

0 

1.91E-007 

rejected  (A-test  outperforms  B) 

B  vs  S-test 

0 

210 

1.91E-007 

rejected  (S-test  outperforms  B) 

A-training  vs  S-training 

126 

84 

- 

accepted 

A-test  vs  S-test 

84 

126 

- 

accepted 
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Fig.  16.  Tree  representation  of  the  expressions  contained  in  a  solution  to  the  prostate  cancer  problem. 
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We  have  tested  GP-RFD  on  a  set  of  artificial  benchmark  problems,  where  a  problem  dataset  is  fabricated  by  applying  an  ad 
hoc  disruption  to  an  original  dataset,  and  it  has  proved  capable  of  solving  all  the  transformations  presented  showing  good 
performance  both  in  train  and,  more  importantly,  test  data. 

We  have  also  being  able  to  apply  GP-RFD  to  a  real-world  problem  where  data  from  two  different  laboratories  regarding 
prostate  cancer  diagnosis  was  provided,  and  where  the  classifier  learned  from  one  did  not  perform  well  enough  on  the  other. 
Our  algorithm  was  capable  of  learning  a  transformation  over  the  second  dataset  that  made  the  classifier  fit  just  as  well  as  it 
did  on  the  first  one.  The  validation  results  with  5-fold  cross  validation  also  support  the  idea  that  the  algorithm  is  obtaining 
good  results;  and  has  a  strong  generalization  power. 

Lastly,  we  have  applied  a  statistical  analysis  methodology  that  supports  the  claim  that  the  classifier  performance  obtained 
on  the  solution  dataset  significantly  outperforms  the  one  obtained  on  the  problem  dataset. 

There  is,  however,  one  point  where  the  proposed  method  has  not  been  successful.  The  learned  transformations  have  failed 
to  provide  any  information  about  why  the  fracture  appeared  between  the  data  from  the  two  laboratories.  We  have,  however, 
included  a  sample  of  the  transformations  learned  in  appendix  A. 
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Appendix  A.  Sample  solution  from  the  prostate  cancer  problem 

In  this  appendix,  we  include  a  sample  of  the  learned  transformations  for  the  prostate  cancer  problem,  presenting  the 
transformations  corresponding  to  the  highest  fitness  individual  ever  found.  Due  to  space  concerns,  only  the  attributes  rele¬ 
vant  to  the  C4.5  classifier  are  shown  (Fig.  16). 
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Shining  a  new  light  into  molecular 
workings 

Francis  L  Martin 

A  technique  to  substantially  increase  the  resolution  and  imaging  area 
of  Fourier-transform  infrared  microspectroscopy,  while  decreasing  the 
amount  of  time  required  for  image  acquisition,  may  augment  the  use 
of  this  technology  in  biomedical  and  environmental  research. 


The  application  of  infrared  spectroscopy 
technologies  has  gained  increasing  recogni¬ 
tion  in  recent  years  as  an  adjunct  support  to 
more  traditional  methodologies,  especially  in 
cell  biology1-3.  A  report  from  Hirschmugl, 


Bhargava  and  co-workers,  in  this  issue  of 
Nature  Methods 4,  demonstrates  a  two  orders 
of  magnitude  improvement  to  this  technique, 
which  for  the  first  time  genuinely  allows 
the  acquisition  of  intracellular  chemical 
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Figure  1  |  Chemical  imaging  of  different  tissue  components  in  a  pixel-by-pixel  fashion.  The  method  of 
Nasse  et  a/.4  could  be  appLied  to  different  applications  within  normal-looking  tissue  architecture  such 
as  identifying  early  cancerous  cells  or  the  in  situ  location  of  stem  cells  required  to  regenerate  a  tissue. 
Fast  imaging  at  high  spatial  resolution  (<1  micrometer)  in  living  tissue  would  even  allow  for  analyses 
of  cell  membranes  or  organelles.  Different  components  would  be  identified  on  the  basis  of  location  of 
their  unique  spectral  signatures. 


Francis  L.  Martin  is  at  the  Centre  for  Biophotonics,  Lancaster  Environment  Centre,  Lancaster  University, 
Lancaster,  UK. 

e-mail:  f.martin@lancaster.ac.uk 


information  with  better  than  micrometer- 
scale  spatial  resolution.  Additionally,  this 
group  provides  the  scientific  community 
with  an  experimental  setup  for  acquisition  of 
minute-by-minute  spectral  information  over 
the  entire  mid-infrared  region  with  an  excel¬ 
lent  signal-to-noise  ratio;  this  could  be  used 
to  nondestructively  monitor  living  biological 
material.  In  the  emerging  field  of  biospec¬ 
troscopy,  which  has  seen  several  pioneering 
developments  over  the  last  two  decades5,6, 
this  work  has  the  genuine  potential  to  act  as 
a  bridge  for  implementing  infrared  spectros¬ 
copy  into  mainstream  biological  practice. 

Fourier-transform  infrared  (FTIR)  spectro¬ 
scopy  for  biological,  environmental  or  bio¬ 
medical  applications  exploits  the  fact  that 
biomolecules  absorb  in  the  mid-infrared  fre¬ 
quency  range  in  a  manner  consistent  with  the 
chemical-bond  composition  of  the  interro¬ 
gated  sample.  Based  on  the  absorbance  pattern 
of  the  present  chemical  bonds  with  an  electric 
dipole  moment  that  changes  during  vibration, 
signature  spectra  are  derived.  Applied  in  imag¬ 
ing  format,  different  FTIR  microscopy  plat¬ 
forms  are  available  to  obtain  such  chemical 
information,  from  a  benchtop  instrument  that 
can  be  found  in  a  typical  physics  or  chemistry 
department  to  an  infrared  beamline  in  one 
of  the  50  or  so  synchrotron  facilities  world¬ 
wide.  A  benchtop  instrument  delivers  reason¬ 
able  spectral  information  but  limited  spatial 
resolution;  for  instance,  it  might  allow  one  to 
derive  an  integrated  tissue  spectral  signature. 
A  synchrotron  system  typically  yields  a  higher 
signal-to-noise  ratio  and  can  also  approximate 
single-cell  spatial  resolution. 

Of  course,  not  everyone  has  ready  access  to 
a  synchrotron  facility  (which  typically  require 
applications  for  beam  time)  and  often  such 
facilities  might  not  be  developed  sufficiently 
to  integrate  the  requirements  of  a  biologi¬ 
cal  laboratory.  The  time-consuming  nature 
of  individual  experiments  within  a  limited 
beam  time  allocation  currently  minimizes 
the  number  of  replicates  achievable,  which 
lessens  the  robustness  of  the  findings. 
Consequently,  the  approach  of  infrared  spec¬ 
troscopy  to  biological  or  environmental  ques¬ 
tions  often  appears  exotic  and  niche. 

FTIR  spectroscopy  has  long  been  applied 
in  the  physical  sciences  as  it  yields  chemical 
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information  about  a  sample.  To  apply  the 
technique  to  biological  specimens — from 
sample  preparation  to  understanding  the  lim¬ 
itations  of  the  system  at  hand  to  processing 
and  interpreting  the  acquired  spectra — the 
practitioner  requires  a  truly  interdisciplin¬ 
ary  approach7.  Infrared  microspectroscopy 
data  collection  has  been  slow,  and  the  spatial 
resolution  has  been  poor.  In  addition,  as  com¬ 
puter  processing  capabilities  have  increased, 
spectral  datasets  have  grown  increasingly 
large;  thus,  developing  and  implementing 
appropriate  computational  algorithms  capa¬ 
ble  of  extracting  relevant  biomarkers  is  yet 
another  cross-discipline  hurdle8.  As  we  better 
understand  the  nature  of  such  derived  spectra 
and  the  underlying  physical  phenomena  that 
may  modify  their  structure,  our  processing  of 
them  and  as  such  the  information  we  extract 
from  them  will  undoubtedly  evolve9.  Within 
this  interdisciplinary  milieu  is  the  conun¬ 
drum  as  to  why  infrared  microspectroscopy 
remains  so  under-exploited  in  the  cell  biol¬ 
ogy  arena. 

The  advantage  of  applying  this  technology 
in  an  imaging  format  is  that  neither  label¬ 
ing  nor  staining  of  the  sample  is  required. 
Whereas  with  other  microscopy  methods 
one  needs  a  priori  knowledge  of  the  sample 
to  be  interrogated  to  facilitate  the  tracking  of 
a  prescribed  biomolecule,  unlimited  by  this 
restriction,  infrared  microspectroscopy  is 
a  discovery-based  method.  However,  at  the 
same  time,  the  inability  to  track  specific  mole¬ 
cules  with  infrared  microspectroscopy  means 
that  it  should  be  used  as  a  complementary  tool 
to  optical  microscopy,  not  as  a  replacement 
for  it. 

The  approach  implemented  by  Hirschmugl, 
Bhargava  and  co-workers4  that  allows  spec¬ 
tral  imaging  at  high  resolution  over  a  wide 
surface  area  (a  tissue  section  containing 
several  glandular  elements)  in  a  short  time 
frame  (within  minutes)  may  substantially 
increase  the  application  of  FTIR  microspec¬ 
troscopy  in  biological  research,  allowing  it 
to  provide  complementary  information  to 
that  of  optical  microscopy  techniques.  They 
achieved  this  by  harnessing  multiple  syn¬ 
chrotron  beams  to  a  focal  plane  array  detec¬ 
tor;  the  latter  is  an  imaging  device  consisting 
of  an  array  of  light-sensing  pixels  placed  at 
the  lens’  focal  plane. 


The  fact  that  FTIR  microscopy  image 
data  now  can  be  acquired  in  minutes  rather 
than  hours  or  days  in  an  evolving  live  cell 
context  opens  up  new  avenues  of  investiga¬ 
tion,  from  tracking  the  cell  differentiation 
process2,  to  shedding  new  insights  into  cell 
cycle  kinetics  to  nanoparticle-induced  toxic¬ 
ity  mechanisms,  to  looking  at  host-parasite 
interactions10.  In  addition,  given  that  infra¬ 
red  imaging  is  nondestructive,  this  allows 
one  to  match  the  point-by-point  spectral 
information  to  more  conventional  stain¬ 
ing  approaches  that  can  be  subsequently 
applied;  thus,  one  might  envision  imaging  a 
tissue  section  containing  a  glandular  element 
suspended  in  stroma  and  then  matching  the 
location  of  spectral  signatures  of  suspect 
transformed  cells  or  quiescent  stem  cells 
to  more  conventional  biomarkers  (Fig.  1). 
This  study4  applies  the  approach  to  sections 
of  prostate  and  breast  tissue;  the  resultant 
images  show  remarkable  clarity  at  a  spatial 
resolution  hitherto  unachievable  with  infra¬ 
red  spectroscopy  and  over  a  wider  area  with 
a  much  shorter  acquisition  time. 

This  advance  now  leads  the  way  to  stimulat¬ 
ing  the  development  of  new  infrared  micro¬ 
scopy  systems  for  routine  use.  Currently,  new 
light  sources,  such  as  quantum  cascade  lasers, 
are  being  developed  that  it  is  expected  will  be 
capable  of  generating  the  requisite  brilliance 
of  light  in  a  benchtop  system  and  thus  will 
have  the  potential  to  generate  a  similar  level 
of  spectral  quality  and  spatial  resolution  as 
a  synchrotron-harnessed  system,  which 
will  be  necessary  to  truly  make  infrared 
microspectroscopy  a  core  technology  in  the 
biology  laboratory. 

Additionally,  the  technique  may  have 
important  clinical  and  environmental  appli¬ 
cations;  infrared  microspectroscopy  will 
potentially  allow  a  pathologist  to  examine  an 
image  of  cellular  architecture  as  well  as  refer¬ 
ence  the  underlying  signature  chemical  infor¬ 
mation  inherent  in  this  picture.  Although 
traditional  methods  such  as  hematoxylin- 
and-eosin  staining  are  ingrained  in  clinical 
practice,  such  approaches  are  often  fraught 
with  subjectivity  and  lack  molecular  detail 
such  as  conclusive  evidence  of  the  earliest 
predisease  alterations.  Infrared  approaches 
allow  for  objective  insight,  which  could 
facilitate  earlier  diagnosis,  which  would  allow 


for  enormous  societal  benefits.  In  environ¬ 
mental  research,  there  is  an  urgent  need  for 
new  approaches  to  monitor  sentinel  organ¬ 
ism  effects  after  complex  exposures;  again, 
computational  algorithms  can  be  exploited 
to  extract  such  mechanistic  information  after 
spectral  analyses11. 

Galileo  used  an  optical  telescope  to  peer 
into  the  universe.  In  the  twentieth  century, 
astronomers  began  to  use  infrared  telescopes. 
Likewise,  biologists  in  a  multiplicity  of  differ¬ 
ent  disciplines  have  used  optical  microscopes 
to  understand  biological  architecture  and 
function.  Infrared  microscopes  shine  a  new 
light  into  the  microverse  that  is  the  biologi¬ 
cal  cell,  allowing  one  to  visualize  processes 
differently  or  to  identify  novel  components 
that  may  facilitate  better  explanation  or 
understanding.  This  is  not  to  say  that  infra¬ 
red  microscopy  will  one  day  replace  optical 
microscopy  or  other  conventional  method¬ 
ologies,  but  it  may  explain  phenomena  that 
hitherto  would  not  have  been  explained  using 
traditional  approaches. 

The  pioneering  work  by  Hirschmugl, 
Bhargava  and  co-workers4  adds  to  the  impetus 
toward  developing  benchtop  instruments  with 
similar  capability  in  the  biological  laboratory. 
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THE  AUTHOR  FILE 


Rohit  Bhargava  and 
Carol  Hirschmugl 

MuLtipLe  synchrotron  beams  make  infrared 
imaging  faster  and  dearer. 


Carol.  Hirschmugl  and  Rohit  Bhargava 


Rohit  Bhargava  expected  that  12  beams  from  a  syn¬ 
chrotron  would  boost  the  quality  of  his  imaging  data, 
but  he  still  was  not  prepared  for  what  he  saw  Bhargava, 
a  bioengineer  at  the  University  of  Illinois  at  Urbana- 
Champaign,  uses  a  technique  called  infrared  spec¬ 
troscopic  imag¬ 
ing,  which  detects 
various  chemical 
groups  in  a  sample 
based  on  their 
absorbance  of  infra¬ 
red  light.  It  is  a  spe¬ 
cialized  technique, 
valued  not  for  its 
ability  to  produce 
stunning  pictures 
but  because  it  can 
yield  molecular- 
level  information 
without  the  need  for  labels.  Normally,  though,  that 
means  detecting  lipids,  fats  and  carbohydrates  with  a 
resolution  of  5  micrometers  or  worse.  Hence  Bhargava’s 
surprise  at  results  from  the  new  technique:  “You  could 
start  to  see  details  that  you  are  used  to  seeing  only  in 
optical  microscopy,”  he  recalls.  “The  crispness,  the 
details  were  comparable.”  In  fact,  the  pixel  size  is  only 
a  half  micrometer  in  diameter,  a  hundredth  the  size  of 
current  state-of-the-art  infrared  imaging. 

“Even  to  this  day,  every  time  we  take  data,  I’m  shocked 
by  the  quality  of  the  images,”  says  Carol  Hirschmugl,  a 
physicist  at  the  University  of  Wisconsin-Milwaukee  who, 
with  Michael  Nasse  and  others,  developed  a  technique 
that  uses  multiple  synchrotron  beams  to  illuminate  sam¬ 
ples  with  infrared  light.  Not  only  is  resolution  improved, 
the  imaging  technique  is  also  faster  than  methods  that 
rely  on  single  beams  or  on  heat  sources  to  produce  infra¬ 
red  radiation.  Data  that  would  normally  take  1 1  days  to 
collect  can  now  be  acquired  in  20  minutes. 

The  idea  for  the  project  began  when  Hirschmugl 
learned  about  a  ‘weekend  experiment’  at  Brookhaven 
National  Laboratories.  A  team  of  scientists  set  four 
beams  onto  nonbiological  samples  and  showed  that 
resolution  improved,  she  says,  but  they  did  not  take  the 
project  further  to  complex  biological  samples,  largely 
because  getting  access  to  a  beamline  is  difficult. 

Hirschmugl  was  intrigued,  so  she  approached  the  scien¬ 
tists  at  the  Synchrotron  Radiation  Center  at  the  University 


of  Wisconsin-Madison,  which  was  built  to  produce  beams 
with  considerably  less  noise  than  other  synchrotrons.  The 
director  offered  access  to  a  bank  of  12  beams — provided 
Hirschmugl  could  get  the  necessary  funding. 

When  the  funding  came  in,  she  and  the  engineers 
got  access  to  the  beamline  within  2  years  (a  timeline  so 
short  Hirschmugl  refers  to  it  as  “miraculous”).  To  fig¬ 
ure  out  how  to  harness  the  setup  for  infrared  imaging, 
the  team  spent  weeks,  sometimes  working  “morning 
to  morning,”  testing  algorithms  to  align  the  beams  and 
focus  the  48  mirrors. 

Their  technique  could  be  used  to  take  great  pictures 
of  polystyrene  beads,  but  to  learn  whether  the  method 
would  be  useful  for  biological  imaging,  Hirschmugl’s 
team  had  to  find  a  biologist  who  could  ask  the  right 
kinds  of  questions.  That  led  her  to  Bhargava,  who  had 
worked  on  some  of  the  earliest  prototypes  of  infrared 
microscopy,  including  theoretical  research  on  how  to 
acquire  data  to  get  informative  images. 

“Out  of  the  blue  I  got  a  call  from  Carol,”  Bhargava 
recalls.  She  said  she  had  an  interesting  instrument  and 
invited  him  to  try  it  out.  “It  was  very  clear  from  the 
theoretical  work  that  this  would  be  something  differ¬ 
ent,”  he  says,  “but  I  couldn’t  have  anticipated  the  nice 
results  we  would  get.”  For  the  first  time,  the  researchers 
could  distinguish  the  collagen-dense  interface  between 
epithelial  and  stromal  cells  using  infrared  imaging,  and 
could  distinguish  between  cancerous  and  healthy  tissue 
in  fixed  slides  of  prostate  and  breast  samples. 

But  that  is  just  the  beginning,  the  collaborators 
say.  Any  samples  that  have  chemical  organization  at 
the  micrometer  scale  can  be  imaged  in  this  facility: 
projects  under  way  include  studying  stem-cell  dif¬ 
ferentiation,  malarial  parasites  inside  cells  and  even 
the  pigment  and  oil  layers  of  500-year-old  paintings. 
Theory-based  research  can  also  expand.  Work  devel¬ 
oped  for  wide-field  imaging  with  optical  techniques 
can  be  applied  to  infrared  imaging,  and  experiments 
on  the  synchrotron  may  show  ways  to  improve  desk¬ 
top  infrared  imaging  instruments. 

Hirschmugl  plans  to  invite  more  collaborators  to 
the  facility  and  even  to  build  facilities  onsite  to  enable 
experiments  on  living  cell  cultures.  That,  however, 
may  depend  on  new  sources  of  funding:  in  the  same 
month  this  paper  in  Nature  Methods  was  accepted  for 
publication,  the  US  National  Science  Foundation  cut 
funds  for  the  Synchrotron  Radiation  Center.  “Now  that 
we  have  these  beautiful  results,  [the  National  Science 
Foundation]  is  not  funding  the  running  of  the  syn¬ 
chrotron,”  says  Hirschmugl.  “It’s  been  an  up  and  down 
time.”  But  perhaps,  she  says,  it  represents  a  new  oppor¬ 
tunity;  she  and  colleagues  are  looking  for  funding 
sources  to  reinvent  the  Synchrotron  Radiation  Center 
as  a  dedicated  infrared  imaging  center. 

Monya  Baker 

Nasse,  M.J.  et  ai.  High-resolution  Fourier-transform 

infrared  chemical  imaging  with  multiple  synchrotron  beams. 

Nat  Methods  8,  413-416  (2011). 
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High-resolution  Fourier- 
transform  infrared  chemical 
imaging  with  multiple 
synchrotron  beams 

Michael  J  Nasse1,2,  Michael  J  Walsh3,  Eric  C  Mattson1, 
Ruben  Reininger4,  Andre  Kajdacsy-Balla5, 

Virgilia  Macias5,  Rohit  Bhargava3  &  Carol  J  Hirschmugl1 

Conventional  Fourier-transform  infrared  (FTIR) 
microspectroscopic  systems  are  limited  by  an  inevitable 
trade-off  between  spatial  resolution,  acquisition  time,  signal- 
to-noise  ratio  (SNR)  and  sample  coverage.  We  present  an 
FTIR  imaging  approach  that  substantially  extends  current 
capabilities  by  combining  multiple  synchrotron  beams  with 
wide-field  detection.  This  advance  allows  truly  diffraction- 
limited  high-resolution  imaging  over  the  entire  mid-infrared 
spectrum  with  high  chemical  sensitivity  and  fast  acquisition 
speed  while  maintaining  high-quality  SNR. 

Stains  and  labels  to  enhance  contrast  in  microscopy  have  been  used 
for  many  years,  leading  to  many  important  discoveries.  However, 
their  use  is  often  time-consuming  and  cumbersome,  can  perturb 
the  function  of  drugs  or  small  metabolites  or  may  be  cytotoxic. 
In  contrast,  label-free  chemical  imaging  requires  no  artificial 
modification  of  biomolecules  or  additional  sample  preparation 
hk  and  permits  a  comprehensive  characterization  of  heterogeneous 
9  materials1.  Chemical  imaging  is  generating  considerable  inter¬ 
est  for  biomedical  analysis  as  dyes  or  stains  are  not  required  for 
contrast  and  substantial  chemical  and  structural  information  can 
be  extracted  without  prior  knowledge  of  molecular  epitopes  or 
manual  interpretation.  Vibrational  spectroscopic  techniques, 
including  both  mid-infrared  absorption  and  Raman  scattering- 
based  imaging,  permit  molecular  analyses  without  perturbation. 
Spontaneous  Raman  scattering  relies  on  a  very  weak  effect  and 
therefore  involves  a  trade-off  between  measurement  time  and 
sensitivity,  potentially  leading  to  photoinduced  sample  damage. 
Emerging  instrumentation2  involving  nonlinear  Raman  contrast 
has  considerably  extended  imaging  capabilities  beyond  these  tradi¬ 
tional  trade-offs,  and  exciting  work  is  underway  to  carefully  match 
lasers  and  reject  spurious  backgrounds  (for  example,  in  coher¬ 
ent  anti- Stokes  Raman  scattering)  and  in  extending  wavelength 


coverage  and  speed  (for  example,  in  stimulated  Raman  scattering). 
Conversely,  the  strong  mid-infrared  absorption  contrast  makes 
infrared  spectroscopy  and  microscopy  a  straightforward,  non¬ 
destructive,  label-free  chemical  contrast  modality  with  broad 
applications1,3  ranging  from  the  analysis  of  graphene-based  mate¬ 
rials,  pharmaceuticals,  volcanic  rocks  and  biominerals  to  applica¬ 
tions  in  forensics  and  art  conservation,  among  others.  Infrared 
spectroscopic  tools  are  particularly  interesting  for  applications  in 
biomedical  fields  such  as  marine  biology,  cancer  research,  stem 
cells  (for  example,  to  delineate  cell  mechanisms  or  lineage),  real¬ 
time  monitoring  of  live  cells,  Alzheimer’s  disease,  Malaria  parasites 
and  more3  (Online  Methods). 

Infrared  instrumentation,  however,  has  stagnated  mostly  owing 
to  spectral-spatial  trade-offs.  Commonly,  low-brightness  thermal 
sources  and  synchrotron  sources  are  used  for  Fourier-transform 
infrared  (FTIR)  microspectroscopy.  Synchrotron  sources  yield 
stable,  broadband  and  high-brightness  radiation,  making  them 
excellent  for  FTIR  microspectroscopy,  but  the  flux  of  conven¬ 
tional  single-beam  beamlines  is  limited  by  the  relatively  small 
horizontal  collection  angle  and  the  resulting  comparatively  small 
source  etendue  makes  them  challenging  to  use  with  wide-field 
imaging  characterized  by  a  relatively  large  acceptance  or  etendue 
(Supplementary  Note  1).  Here  we  used  multiple  synchrotron 
beams  with  a  wide-field  detection  scheme.  This  allowed  us  to 
acquire  truly  diffraction-limited,  high-spatial-resolution  infrared 
images  of  high  spectral  quality  with  outstanding  speed,  consider¬ 
ably  extending  the  potential  of  infrared  microscopy. 

For  an  optical  system  permitting  diffraction-limited  imaging, 
spatial  resolution  is  defined  as  the  capacity  to  separate  two  adjacent 
(point-like)  objects.  To  achieve  the  highest  (diffraction-limited) 
resolution,  an  objective  with  the  largest  possible  numerical  aper¬ 
ture  (NA)  should  be  used,  and  the  instrument’s  signal-to-noise 
ratio  (SNR)4,5  should  be  optimized.  Also,  it  is  indispensable  to 
match  the  image  pixilation  to  the  NA  of  the  objective  using  the 
appropriate  spatial  sampling  or  pixel  size.  Too-large  pixels  inevita¬ 
bly  lead  to  resolution  loss,  whereas  smaller  pixels  do  not  improve 
the  resolution  further.  A  detailed  analysis4  (Online  Methods) 
shows  that,  assuming  the  largest  commercially  available  NA  of 
-0.65,  diffraction-limited  resolution  over  the  entire  mid-infrared 
spectrum  can  only  be  achieved  with  an  effective  pixel  spacing 
not  larger  than  -A/ 4  or  -0.6  pm  for  the  shortest  wavelength  of 
interest  (A  =  2.5  pm). 

One  approach  to  infrared  microscopy  uses  a  single  element 
detector  and  confocal-like  apertures  to  localize  light  incident 
on  the  sample.  In  this  configuration,  pixel  size  is  given  by  the 
raster-scanning  step  size4.  Apertures  of  dimension  a  only  deliver 
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Figure  1  |  FTIR  imaging  with  a  multibeam  synchrotron  source,  (a)  Schematic  of  the  experimental  setup. 
M1-M4  are  mirror  sets,  (b)  A  full  128  x  128  pixel  FPA  image  with  12  overlapping  beams  illuminating 
an  area  of  -50  |im  x  50  |tm.  Scale  bar,  40  pirn,  (c)  A  visible-light  photograph  of  the  12  beams  projected 
on  a  screen  in  the  beam  path  (dashed  box  in  a).  Scale  bar,  ~1.5  cm.  We  display  the  beams  as  one  beam 
from  then  on  in  the  schematics.  Each  beam  exhibits  a  shadow  cast  by  a  cooling  tube  upstream,  which 
is  not  shown  in  a.  (d)  Long-exposure  photograph  showing  the  combination  of  the  12  individual  beams 
into  the  beam  bundle  by  mirrors  M3  and  M4.  Scale  bar,  -20  cm. 


diffraction-limited  resolution6  when  A  >  a.  For  A  <  a,  diffraction- 
limited  resolution6  is  not  attained,  whereas  for  longer  wave¬ 
lengths  the  throughput  decays  rapidly.  This  trade-off  between 
resolution  and  throughput  (or  SNR)  is  particularly  penalizing 
for  infrared  microspectroscopy  because  of  the  broad  band¬ 
width.  In  practice,  reasonable  SNR  limits  the  smallest  aperture 
for  the  illumination  at  the  sample  plane  to  ~10  pm  x  10  pm  for  a 
thermal  source6  and,  in  a  few  demonstrations7,  down  to  ~3  |Pm  x 
3  pm  for  synchrotron  sources.  The  small  aperture  transmissivity 


of  only  a  few  percent  makes  point-by¬ 
point  sampling  systems  very  inefficient 
because  of  the  dual  need  for  signal 
averaging  to  obtain  high  SNR  and  raster- 
ing  a  small  pixel  size  to  acquire  data,  lead¬ 
ing  to  exceedingly  long  acquisition  times. 
These  trade-offs  make  sequential  point 
sampling  impractical  for  micrometer- scale 
aperture  sizes  and  sub-micrometer-scale 
raster  step  sizes  (necessary  for  correct 
spatial  sampling4)  to  achieve  diffraction- 
limited  maps.  For  example,  it  takes  2-4  h 
to  acquire  an  area  of  only  30  |im  x  30  jtm 
as  a  fully  diffraction-limited  map  at  a 
state-of-the-art  third-generation  synchro¬ 
tron7  equipped  with  a  conventional  con- 
focal  system.  Lengthy  collection  times,  in 
most  practical  cases,  lead  experimenters 
to  choose  larger  aperture  and  step  sizes, 
thereby  compromising  the  achievable 
spatial  resolution.  In  contrast,  our 
system  can  cover  this  area  in  under  a 
minute  without  compromising  the  spatial  sampling  required 
for  diffraction-limited  resolution. 

We  based  our  approach  on  the  more  recent  strategy  of  wide-field 
imaging  using  multichannel  focal  plane  array  (FPA)  detectors8-10, 
in  which  no  lossy  apertures  are  used.  This  increases  spatial 
coverage  and  imaging  speed  greatly,  but  the  SNR  using  a  thermal 
source  limits  pixel  sizes  to  ~5  pm  x  5  pm  at  the  sample  plane. 
Achieving  a  pixel  size  -100  times  smaller  to  correctly  sample  the 
diffraction-limited  illumination  is  very  ineffective,  resulting  in  a 
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Figure  2  |  Chemical  images  from  various  FTIR  systems,  (a-d)  The 
same  cancerous  prostate  tissue  section  (area,  -280  pm  x  310  pm) 
measured  with  different  instruments,  using  the  integrated  absorbance 
of  the  CH-stretching  region  (2,800-3,000  cm-1),  without  dyes  or 
stains.  We  processed  all  images  identically  (baseline  correction  only) 
and  used  the  same  color  scale  (color  bar  in  a;  All,  absorbance  units). 

Scale  bars,  100  pm  and  in  insets,  10  pm.  Images  acquired  with  a 
conventional  table-top  system  (PerkinElmer  Spotlight)  equipped 
with  a  thermal  source  in  raster-scanning  mode  (10  pm  x  10  pm;  a) 
and  linear  array  mode  (6.25  pm  x  6.25  pm;  b),  with  an  FTIR  imaging  system  (Varian  Stingray)  equipped  with  a  64  pixel  x  64  pixel  FPA  (5.5  x  5.5  pm 
per  pixel  at  the  sample  plane;  c)  and  with  our  multibeam  synchrotron-based  imaging  system  (pixel  size,  0.54  pm  x  0.54  pm;  d).  (e)  Hematoxylin  and 
eosin  (H&E)-stained  prostate  tissue  (diameter,  0.75  mm).  Scale  bar,  100  pm.  Dashed  box  specifies  the  corresponding  area  of  a  serial,  unstained  section 
from  which  we  generated  images  in  a-d.  (f)  Typical  unprocessed  spectra  from  a  single  pixel  acquired  with  each  instrument  (crosshairs  in  a-d  indicate 
corresponding  pixel  positions  in  the  infrared  images). 
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~100-fold  lower  SNR  (Supplementary  Fig.  1)  and  thus  in  a  ~104- 
fold  longer  scanning  time8.  Hence,  to  our  knowledge  there  are  no 
reports  of  a  true  diffraction-limited  FTIR  imaging  system  with  a 
thermal  source. 

In  2006  independent  groups11-13  pioneered  the  coupling  of 
a  synchrotron  beam  with  an  FPA  detector,  which  is  not  obvious 
because  wide-field  illumination  seems  incompatible  with  a  small, 
low-emittance  synchrotron  beam.  These  groups  demonstrated  that, 
with  a  single  synchrotron  beam,  a  local  region  of  the  FPA  can  be 
illuminated,  and  that  this  region  yielded  increased  SNR  compared  to 
thermal  sources.  This  inhomogeneous  illumination,  however,  means 
that  either  a  relatively  small  FPA  (and  thus  sample  area)  must  be 
used  or  that  the  acquisition  time  must  be  increased  to  compensate 
the  inhomogeneous  illumination.  This  coverage-SNR  trade-off  has 
hampered  the  use  of  synchrotron-based  technology:  only  one  recent 
publication14  uses  a  single  synchrotron  beam  with  an  FPA. 

Here  we  present  an  infrared  imaging  system  specifically 
designed  and  optimized  to  overcome  these  limitations  by  coupling 


multiple  low-emittance  synchrotron  beams  with  a  large  FPA 
detector.  We  extracted  a  large  fan  of  radiation  from  a  dedicated 
bending  magnet,  split  it  into  12  beams  and  subsequently  rear¬ 
ranged  these  into  a  3  x  4  matrix  beam  bundle  to  illuminate  a 
large  field  ofview  in  the  sample  plane  (Fig.  1).  We  engineered  the 
matrix  to  achieve  homogeneous  illumination  over  areas  of  up  to 
52  pm  x  52  pm  (96  pixels  x  96  pixels;  Fig.  lb  and  Supplementary 
Fig.  2)  with  each  pixel  corresponding  to  0.54  pm  x  0.54  pm  at 
the  sample  plane.  This  pixel  size,  -100  times  smaller  than  con¬ 
ventional  thermal  or  synchrotron  systems,  is  smaller  than  the 
maximum  pixel  size  allowed  for  correct  spatial  sampling  (over- 
sampling)  so  that  diffraction-limited  images  even  at  the  smallest 
wavelength  of  interest  (2.5  pm)  are  possible  (Online  Methods). 
Although  we  designed  this  system  explicitly  for  acquisition  in 
transmission  mode,  it  also  yields  equivalent  quality  images  in 
reflection  mode  (Supplementary  Figs.  3  and  4). 

To  test  this  approach,  we  compared  data  from  the  same  pros¬ 
tate  tissue  using  various  state-of-the-art  infrared  imaging  systems 
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Figure  3  |  High-resolution  multibeam 
synchrotron  FTIR  imaging,  (a)  Hematoxylin 
and  eosin  (H&E)-stained  image  of  cancerous 
prostate  tissue  with  chronic  inflammation 
obtained  using  visible  light  microscopy. 
(b,c)  Multibeam  synchrotron  absorbance 
images  obtained  from  an  unstained  serial 
section  of  the  sample  shown  in  a.  Spatial 
detail  in  images  from  the  new  system 
is  highlighted  for  lymphocytes  (blue 
arrow)  and  red  blood  cells  (red  arrow). 

(d)  Image  of  the  same  unstained  section 
imaged  with  a  conventional  table-top 
system  (PerkinElmer  Spotlight,  linear  array 
mode),  (e)  Expanded  views  of  the  boxed 
area  in  b  showing  the  typical  appearance 
of  lymphocytes  in  H&E  stained  samples 
(top),  the  new  system  (bottom  left)  and  a 
conventional  table-top  instrument  (bottom 
right),  (f)  H&E-stained  visible  light  image 
(top),  asymmetric  CH-stretching  (2,840 
cm-1,  center)  and  collagen-specific  (1,245 
cm-1,  bottom)  infrared  images  of  an 
unstained  section  of  normal  breast  tissue 
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(terminal  ductal  lobular  unit  region).  Epithelial  (green  arrow)  and  intralobular  stromal  regions  (magenta  arrow)  are  highlighted,  (g)  Spectra  of  epithelial 
and  stromal  cells  recorded  with  a  multibeam  synchrotron  versus  a  thermal  source,  (h)  Absorbance  image  (2,840  cm-1;  top)  of  an  unstained  cancerous 
prostate  tissue  showing  two  benign  prostate  glands.  Inset,  potential  presence  of  basement  membrane  at  the  interface  between  stroma  and  epithelium 
is  marked  (arrows).  Image  (bottom)  showing  epithelial  (green)  and  stromal  (magenta)  cells  classified  using  previous  algorithms,  (i)  Average  spectra 
from  epithelial,  stromal  (two  each:  one  closer  to  the  interface,  one  farther  away),  and  interface  pixels  identified  manually  from  data  obtained  using  two 
different  instruments.  AU,  absorbance  units.  Scale  bars,  50  |im. 
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(Fig.  2  and  Supplementary  Fig.  1).  None  of  the  other  instru¬ 
ments  provided  diffraction-limited  resolution  at  all  wavelengths 
(Fig.  2a-c).  Raster-scanning  the  area  shown  in  Figure  2a-d 
(-280  pm  x  310  pm)  at  diffraction-limited  resolution  using  a 
synchrotron-based  dual-aperture  microscope  would  require  over 
1 1  d.  In  contrast,  using  our  technique  we  recorded  the  same  area 
(Fig.  2d)  in  -30  min  (16  scans).  The  spectral  quality  was  essen¬ 
tially  identical  (Fig.  2f)  to  that  of  the  best  commercial  systems, 
despite  the  -100-fold  pixel  area  reduction.  This  pixel  size  pro¬ 
vided  the  additional  spatial  detail  (Fig.  2)  necessary  for  infrared 
imaging  to  become  competitive  with  optical  microscopy  in  bio¬ 
medical  applications.  In  another  example,  wide-field  multibeam 
synchrotron  imaging  revealed  lymphocytes  (diameter,  -2-7  pm) 
and  other  tissue  features  that  were  clearly  visible  in  hematoxylin 
and  eosin-stained  images  (the  clinical  gold  standard  for  diagno¬ 
sis;  Fig.  3a-c).  The  same  visualizations  were  impossible  using 
conventional  table-top  infrared  systems  (Fig.  3d,e).  The  contrast 
in  these  images  can  be  used  to  color-code  images  into  constituent 
cell  types15;  hence  the  capability  of  our  technique  opens  up  the 
possibility  of  subcellular  classification. 

Furthermore,  pixel  localization  also  improved  spectral  purity  of 
data  extracted  from  images.  The  hematoxylin  and  eosin  contrast 
was  well-reproduced  with  our  technique  using  simple  absorption 
features,  and  epithelial  and  stromal  regions  were  clearly  delineated 
without  staining  (Fig.  3f).  The  additional  detail  in  synchrotron 
wide-field  images  allowed  relatively  limited  cross-contamination 
of  spectra  from  both  intralobular  stromal  and  epithelial  regions. 
Although  we  expected  these  characteristic  spectra  to  be  different, 
the  limited  pixel  size  of  the  thermal  source  systems  demonstrated 
substantial  overlap,  but  the  multibeam  synchrotron  system  pro¬ 
vided  distinct  spectra  (Fig.  3g).  Using  our  technique,  we  also  clas¬ 
sified  an  infrared  image  of  prostate  tissue  into  constituent  cell  types 
(Fig.  3h).  Although  it  is  well-known  that  the  basement  membrane 
lies  at  the  interface  of  epithelial  and  stromal  cells  and  is  critical  in 
diagnosing  lethal  cancer,  the  basement  membrane  is  not  discern- 
able  in  images  from  thermal  systems.  We  classified  infrared  tis- 
©  sue  images  into  cell  types15,  and  identified  the  interface  between 
the  epithelial  and  stromal  cells  (Fig.  3h).  Thermal  source  spectra 
[gjl  from  these  regions  were  an  average  of  epithelial  and  stromal  pixels, 
whereas  interface  spectra  extracted  from  the  synchrotron  image 
were  distinct  from  both  contributions  (Fig.  3i),  which,  with  the 
higher  collagen  triplet  absorption,  was  suggestive  of  the  basement 
membrane.  Additional  investigations  are  in  progress. 

To  validate  the  optical  capability  of  our  system,  we  recorded 
images  of  a  1951  US  Air  Force  test  target5  (Supplementary 
Figs.  3a, b  and  4).  We  used  line  profiles5  (Supplementary  Fig.  3e-h) 
to  determine  the  contrast  for  each  pattern,  quantitatively  con¬ 
firming  that  our  system  reached  and  exceeded  (Supplementary 
Note  2)  the  Rayleigh  resolution  criterion  and  delivered  diffrac¬ 
tion-limited  images  over  the  entire  mid-infrared  bandwidth. 
Furthermore,  spatial  oversampling  at  all  wavelengths  and  high 
SNR,  as  offered  by  our  system,  are  a  prerequisite12,13  for  devel¬ 
oping  computational  resolution  enhancement  techniques.  We 
implemented  a  spatial  deconvolution  algorithm  (Supplementary 
Note  3)  based  on  (wavelength-dependent)  measured  point-spread 
functions  (Supplementary  Figs.  5  and  6).  The  increased  con¬ 
trast  and  resolution  of  the  deconvolved  US  Air  Force  target  sam¬ 
ple  images  were  apparent  in  the  line  profiles  (Supplementary 
Fig.  3c-h).  Furthermore,  measurements  of  -1  pm  polystyrene 


beads  confirmed  that  our  system  reached  a  spectral  limit  of  detec¬ 
tion  of  6  ±  1  fmol  (mass,  600  ±  100  fg;  and  volume,  0.6  ±  0.1  fl) 
in  a  single  0.54  pm  x  0.54  pm  pixel  (Supplementary  Fig.  7).  We 
estimated  that  this  limit  is  about  two  orders  of  magnitude  finer 
than  that  of  present  instrumentation16. 

The  use  of  multiple  synchrotron  beams  enabled  us  to  achieve  a 
homogeneously  high  SNR  over  a  large  FPA  area,  which  improved 
sample  coverage  and  acquisition  speed  compared  to  conventional 
thermal  or  synchrotron-based  systems  and  enabled  high  diffraction- 
limited  spatial  resolution  over  the  entire  mid-infrared  spectrum.  The 
improvement  in  acquisition  time  opens  the  way  to  real-time  nonin- 
vasive  and  label-free  live-cell  imaging.  We  hope  that  our  technique 
spurs  the  community  to  develop  appropriate  optical  designs  for  table- 
top  instruments  and  provides  a  rationale  for  laser-based  systems  and 
other  multibeam  synchrotron-based  imaging  beamlines. 

METHODS 

Methods  and  any  associated  references  are  available  in  the  online 
version  of  the  paper  at  http://www.nature.com/naturemethods/. 

Note:  Supplementary  information  is  available  on  the  Nature  Methods  website. 
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ONLINE  METHODS 

Requirements  for  diffraction-limited  resolution.  Mid-infrared 
spectroscopy  and  microscopy  has  very  broad  applications  in  many 
scientific  fields,  ranging  from  fundamental  and  applied  research 
to  engineering  and  biology15-29.  Infrared  microspectroscopy  in 
particular  can  contribute  to  the  biomedical  sciences  because  of 
its  noninvasive  spatially  resolved  chemical  specificity.  Here  we 
describe  the  requirements  to  obtain  diffraction-limited  spatial 
resolution  with  a  mid-infrared  microscope. 

Spatial  resolution  can  be  quantified,  for  example,  by  the 
Rayleigh5  criterion  as  d  =  0.61  A  /  NA,  in  which  d  is  the  mini¬ 
mum  distance  between  two  adjacent  (point-like)  objects  that 
are  just  resolved  (the  factor  0.61  is  strictly  valid  only  for  lenses 
without  obscuration  and  smaller  for  Schwarzschild  optics;  see 
Supplementary  Note  2).  But  achievable  spatial  resolution  is  not 
only  dependent  on  the  wavelength  and  the  NA  of  the  objective 
via  the  Rayleigh  criterion  but  also  on  the  pixel  size,  that  is,  the 
objective’s  magnification  and  the  SNR  of  the  imaging  system4.  To 
observe  diffraction-limited  performance,  a  spatial  sampling  of  at 
least  ~8  pixels4  per  Airy  pattern  is  required  to  achieve  sufficient 
contrast.  Smaller  pixel  sizes  (oversampling)  do  not  improve  the 
resolution,  which  is  then  limited  by  diffraction,  whereas  larger 
pixels  unavoidably  deteriorate  contrast  and  thus  resolution 
(undersampling).  For  the  smallest  wavelength  (2.5  pm)  using  an 
NA  of  0.65,  we  need  a  pixel  size  not  larger  than  1.22  x  2.5  pm  / 
0.65  18  =  0.59  pm.  Even  the  less  restrictive  Nyquist  theorem  yields 
a  maximum  pixel  size  of  1  /  (2.3/  t  ff)  =  0.84  pm  (usually  2.3  is 
used  instead  of  the  theoretical  2  suggested  by  Nyquist  to  account 
for  factors  such  as  noise  in  real  optical  systems30),  where fcuiog= 
2  NA  /  A  is  the  spatial  cutoff  frequency,  equivalent  to  the  Sparrow 
frequency5,31.  In  summary,  this  means  that  the  NA  of  an  objec¬ 
tive  alone  is  not  enough  to  provide  the  resolution  promised  by 
the  Rayleigh  criterion,  but  its  magnification  also  has  to  match. 
In  the  case  of  an  objective  with  an  NA  of  0.65  (approximately  the 
largest  commercially  available  NA,  giving  the  best  possible  spatial 
resolution),  it  needs  at  least  a  magnification  of  40  pm  /  0.59  pm  = 
©  68  (assuming  a  typical  FPA  pixel  size  of  40  pm  x  40  pm).  We  used 

a  74x  objective  (NA  =  0.65)  in  our  setup,  leading  to  a  pixel  size  of 
gji  0.54  pm  x  0.54  pm  (slight  oversampling).  In  addition  this  high 
spatial  sampling  offers  the  advantage  that  sub  diffraction  objects 
can  be  localized  (but  of  course,  not  resolved)  with  an  accuracy 
better  than  the  diffraction  limit32. 

Instrument  design.  Synchrotron  storage  rings  are  excellent 
light  sources  for  aperture-based  infrared  microspectroscopy33 
as  the  small  horizontal  and  vertical  emittance  (source  etendue) 
of  conventional  single-beam  beamlines  and  the  relatively  small 
acceptance  (detector  system  etendue)  of  the  microscopy  system 
can  be  closely  matched  (Supplementary  Table  1).  Increasing  the 
photon  flux  by  extracting  a  larger  horizontal  angle  from  a  bend¬ 
ing  magnet,  however,  is  not  beneficial  because  the  additional 
photons  cannot  be  coupled  efficiently  to  the  small  acceptance 
of  such  microscopy  systems.  For  wide-field  microscopes  with¬ 
out  throughput-restricting  apertures,  in  contrast,  single  beams 
from  conventional  beamlines  have  limited  flux  owing  to  their 
relatively  small  emittance,  making  it  challenging  to  match  the 
relatively  large  acceptance  of  a  multichannel  FPA  imaging  instru¬ 
ment.  The  instrument  described  here  substantially  increased  the 
horizontal  collection  angle  to  match  the  large  acceptance  of  a 


wide-field  imaging  system  to  fully  exploit  the  source  brightness. 
It  is  located  at  the  Synchrotron  Radiation  Center  in  Stoughton, 
Wisconsin,  USA,  which  already  houses  a  conventional  aperture- 
based  infrared  microscope.  This  synchrotron  facility  encourages 
scientists  to  apply  for  peer-reviewed  access  to  beamtime  and/or 
initiate  a  collaboration  with  the  authors  of  this  work.  Applications 
are  accepted  for  review  every  six  months  and  rapid  requests  for 
initial  experiments  are  handled  more  frequently  (http://www.src. 
wisc.edu/users/new_users.html). 

We  extracted  320  mrad  x  27  mrad  of  infrared  radiation  from 
a  dedicated  bending  magnet  and  split  this  fan  of  radiation  into 
twelve  beams  with  a  set  of  twelve  toroidal  mirrors  (Ml;  Fig.  1), 
which  refocused  each  beam  (magnification  of  1).  Each  beam 
exited  an  ultrahigh  vacuum  chamber  via  one  of  twelve  flat  mirrors 
(M2;  Fig.  1)  through  one  of  twelve  ZnSe  windows  (Fig.  1)  into  a 
nitrogen- purged  area.  Next,  twelve  parabolic  mirrors  (M3;  Fig.  1) 
collimated  the  beams,  followed  by  twelve  stacked  small  flat  mir¬ 
rors  (M4;  Fig.  1)  that  rearranged  the  beams  into  a  3  x  4  matrix.  We 
used  a  subsequent  piezo-driven  optical  feedback  system  (feedback 
system  is  not  shown)  to  stabilize  the  beam  bundle,  reduce  vibration 
effects  and  increase  the  SNR.  Next,  we  sent  the  bundle  through  a 
Vertex  70  (Bruker)  spectrometer  (Fig.  1),  which  was  coupled  to 
a  Hyperion  3000  (Bruker)  infrared  and  visible  light  microscope. 
There,  the  slightly  defocused  beam  bundle  illuminated  the  sample 
area  through  a  15x  or  20x  Schwarzschild  condenser  (Fig.  1)  to 
spread  out  each  beam  so  that  the  beams  overlap  spatially  to  pro¬ 
vide  quasi-homogeneous  illumination  at  the  sample.  Finally,  a  74x 
objective  (Ealing)  imaged  the  sample  onto  a  128  pixel  x  128  pixel 
FPA  (Santa  Barbara  Focalplane),  so  that  each  pixel  had  an  effec¬ 
tive  geometrical  area  at  the  sample  plane  of  0.54  pm  x  0.54  pm 
(Fig.  1).  Additional  design  details  of  the  imaging  system  have 
been  reported  elsewhere34.  In  contrast  to  other  implementations 
of  thermal  or  synchrotron  sources,  our  multibeam  system  allowed 
us  to  simultaneously  uniformly  illuminate  an  order  of  magnitude 
more  pixels  (96  pixels  x  96  pixels;  Fig.  lb)  and  used  an  objective 
with  a  substantially  higher  NA  of  0.65  with  a  correctly  matched4 
pixel  size  (0.54  pm  x  0.54  pm)  to  maintain  full  high  diffraction- 
limited  resolution  over  the  mid-infrared  spectrum  at  a  high  SNR. 
We  used  a  condenser  with  an  NA  of  ~0.6  to  match  the  NA  of 
the  objective.  Owing  to  its  higher  NA,  this  objective  delivered 
38%  and  23%  higher  spatial  resolution  (according  to  the  Rayleigh 
criterion)  compared  to  previous  studies  (for  example,  the  15x 
objective  with  NA  =  0.4  and  pixel  size  =  2.7  pm  x  2.7  pm  or  36x 
objective  with  NA  =  0.5  and  pixel  size  =  1.1  pm  x  1.1  pm)11,14. 
Furthermore,  owing  to  the  multibeam  design,  a  high  synchrotron 
storage  ring  current  was  not  mandatory  to  obtain  high  SNR.  The 
-270  mA  current  of  our  storage  ring  was  sufficient  to  achieve 
similar  SNR  (Fig.  2d,f)  leading  to  shorter  acquisition  times  com¬ 
pared  to  those  reported  in  previous  publications14.  The  present 
design  can  cover  more  than  double  the  sample  area  in  equivalent 
or  shorter  times  with  better  spatial  resolution  as  compared  to 
single  synchrotron  beam  systems. 

Synchrotron  sources  may  have  coherent  properties,  for  example, 
synchrotrons  with  pulse  lengths  shorter  than  tens  of  femtoseconds 
in  the  far  infrared.  The  present  source,  however,  had  nanosecond 
pulses,  and  we  designed  the  path  lengths  for  the  twelve  beams  to 
never  temporally  overlap  on  the  sample  or  detector  plane.  Hence, 
temporal  coherence  did  not  have  an  impact  on  the  imaging  qual¬ 
ity  of  the  images  produced  by  the  microscope.  Experimentally  we 
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observed  no  spectral  evidence  of  spatial  or  temporal  coherence 
effects,  nor  any  impact  on  image  quality  or  resolution,  as  can  be 
seen,  for  example,  by  the  correspondence  between  the  thermal 
and  synchrotron  spectral  data. 

Experimental  details,  data  processing,  and  samples.  We  con¬ 
ducted  conventional  thermal  source-based  imaging  on  two 
commercial  systems:  Stingray  (Varian;  Fig.  2c)  using  an  FPA 
detector  and  Spotlight  400  (PerkinElmer;  Figs.  2a, b  and  3d,e 
and  Supplementary  Fig.  la)  equipped  with  a  single  element 
and  a  16-pixel  linear  array  detector.  We  acquired  the  synchro¬ 
tron  point-by-point  scanning  image  (Supplementary  Fig.  lb) 
on  a  Continupm  (Thermo  Nicolet)  dual-aperture  microscope 
connected  to  beamline  031,  and  we  collected  the  remaining 
images  with  the  multibeam  synchrotron  system  connected  to 
a  Hyperion  3000  (Bruker)  microscope  at  beamline  021,  both 
at  the  Synchrotron  Radiation  Center.  The  Varian,  PerkinElmer 
and  Thermo  Nicolet  measurements  used  a  Happ-Genzel,  the 
Bruker  measurements  a  Norton-Beer  (medium)  apodization. 
We  baseline-corrected  the  images  in  Figures  2  and  3  (including 
spectra),  Supplementary  Figures  1  and  7;  all  other  infrared 
images  as  well  as  spectra  show  raw  data.  We  did  not  use  post¬ 
acquisition  smoothing  or  filtering.  The  infrared  data  were  ana¬ 
lyzed  and  images  were  created  with  software  packages  IRidys 
(in-house  development)  and  ENVI  (ITT  VIS). 

The  prostate  cancer  sample  (Gleason  grade  6)  with  epithelial 
cells  (Fig.  2  and  Supplementary  Fig.  1)  was  a  viable  tumor  with¬ 
out  necrosis,  in  a  cribriforming  pattern  and  had  some  strands 
of  stroma  crossing  through  it.  A  second  prostate  cancer  sam¬ 
ple,  which  was  also  Gleason  grade  6  for  comparison  (Fig.  3a-e), 
had  chronic  inflammation  (mostly  mononuclear  cell  infiltration 
of  macrophages  and  lymphocytes)  and  contained  two  glands,  a 
small  vessel  with  a  muscular  wall  and  capillaries  (with  blood).  The 
tissue  shown  in  Figure  3f  was  a  normal  human  breast  tissue  core 
including  the  terminal  ductal  lobular  unit  (TDLU)  region  and 
the  tissue  shown  in  Figure  3h  contained  two  benign  prostate 
©  glands  from  a  cancerous  prostate  tissue  core  (Gleason  grade  6). 

Tissues  used  here  were  from  anonymized  samples  from  individu- 
pti  als  and  involved  secondary  analysis  as  approved  by  the  University 
of  Illinois  at  Urbana-Champaign  Institutional  Review  Board, 
protocol  06684.  We  fixed  all  biomedical  samples  in  4%  para¬ 
formaldehyde,  embedded  them  in  paraffin,  sectioned  them  at  a 
thickness  of  4  pm,  mounted  them  on  a  BaF9  infrared  transparent 
window  and  deparaffinized  them  with  hexane  for  48  h  before 


measurement.  In  transmission  mode  sample  thickness  can  affect 
the  obtainable  spatial  resolution.  Using  a  simple  geometric  model 
we  estimated  that  the  sample  thickness  should  not  be  above  ~3-4 
pm  to  achieve  full  diffraction-limited  resolution. 

We  purchased  the  apertures  (Supplementary  Figs.  5  and  6) 
from  National  Aperture,  Inc.,  the  high-resolution  US  Air  Force 
(USAF)  test  target  (Supplementary  Fig.  3)  from  Edmund  Optics 
Inc.  and  the  polystyrene  beads  (Supplementary  Fig.  7)  from 
Polysciences,  Inc.  We  diluted  the  polystyrene  bead  suspension 
with  water,  dispensed  it  on  an  ultrathin  formvar  film  substrate 
and  then  air-dried  it. 

We  recorded  images  of  polystyrene  beads  with  a  diameter  of  ~1 
and  2  pm  (acquisition  time,  ~5  min)  to  examine  spectral  limits 
of  detection  per  pixel.  We  detected  the  6+1  fmol  or  3.4  x  109 
(±  0.7  x  109;  s.d.)  CH2  groups  contained  in  a  1  pm  polystyrene 
bead  (mass,  600  ±  100  fg;  volume,  0.6  ±  0.1  fl)  in  a  single  0.54  pm 
x  0.54  pm  pixel  using  the  International  Union  of  Pure  and  Applied 
Chemistry  (IUPAC)  detection  limit  criterion  (Supplementary 
Fig.  7).  We  estimated  this  to  be  ~  100-fold  better  than  with  current 
instrumentation16  and  this  compared  favorably  with  the  lowest 
detection  limit  reported35  using  destructive  methods. 
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Abstract.  In  experimental  sciences,  diversity  tends  to  difficult  predic¬ 
tive  models’  proper  generalization  across  data  provided  by  different  lab¬ 
oratories.  Thus,  training  on  a  data  set  produced  by  one  lab  and  testing 
on  data  provided  by  another  lab  usually  results  in  low  classification  ac¬ 
curacy.  Despite  the  fact  that  the  same  protocols  were  followed,  variabil¬ 
ity  on  measurements  can  introduce  unforeseen  variations  that  affect  the 
quality  of  the  model.  This  paper  proposes  a  Genetic  Programming  based 
approach,  where  a  transformation  of  the  data  from  the  second  lab  is 
evolved  driven  by  classifier  performance.  A  real-world  problem,  prostate 
cancer  diagnosis,  is  presented  as  an  example  where  the  proposed  ap¬ 
proach  was  capable  of  repairing  the  fracture  between  the  data  of  two 
different  laboratories. 


1  Introduction 

The  assumption  that  a  properly  trained  classifier  will  be  able  to  predict  the 
behavior  of  unseen  data  from  the  same  problem  is  at  the  core  of  any  automatic 
classification  process.  However,  this  hypothesis  tends  to  prove  unreliable  when 
dealing  with  biological  data  (or  other  experimental  sciences),  especially  when 
such  data  is  provided  by  more  than  one  laboratory,  even  if  they  are  following 
the  same  protocols  to  obtain  it. 

This  paper  presents  an  example  of  such  a  case,  a  prostate  cancer  diagnosis 
problem  where  a  classifier  built  using  the  data  of  the  first  laboratory  performs 
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very  accurately  on  the  test  data  from  that  same  laboratory,  but  comparatively 
poorly  on  the  data  from  the  second  one.  It  is  assumed  that  this  behavior  is  due  to 
a  fracture  between  the  data  of  the  two  laboratories,  and  a  Genetic  Programming 
(GP)  method  is  developed  to  homogenize  the  data  in  subsequent  subsets.  We 
consider  this  method  a  form  of  feature  extraction  because  the  new  dataset  is 
constructed  with  new  features  which  are  functional  mappings  of  the  old  ones. 

The  method  presented  in  this  paper  attempts  to  optimize  a  transformation 
over  the  data  from  the  second  laboratory,  in  terms  of  classifier  performance. 
That  is,  the  data  from  the  second  lab  is  transformed  into  a  new  dataset  where 
the  classifier,  trained  on  the  data  from  the  first  lab,  performs  as  accurately  as 
possible.  If  the  performance  achieved  by  the  classifier  in  this  new,  transformed, 
dataset,  is  equivalent  to  the  one  obtained  in  the  data  from  the  first  lab,  we 
understand  the  data  has  been  homogenized. 

More  formally,  the  classifier  /  is  trained  on  data  from  one  laboratory  (dataset 
A),  such  that  y  =  f(xA)  is  the  class  prediction  for  one  instance  xA  of  dataset 
A.  For  the  data  from  the  other  lab  (dataset  B),  it  is  assumed  that  there  exists 
a  transformation  T  such  that  f(T(xB))  is  a  good  classifier  for  instances  xB 
of  dataset  B.  The  ’goodness’  of  the  classifier  is  measured  by  the  loss  function 
l(f(T(xB)),y),  where  y  is  the  class  associated  with  xB ,  and  l(., .)  is  a  measure 
of  distance  between  f(T(xB ))  and  y.  The  aim  is  to  find  a  transformation  T  such 
that  the  average  loss  over  all  instances  in  B  is  minimized. 

The  remainder  of  this  paper  is  organized  as  follows:  In  Section  2,  some  prelimi¬ 
naries  about  the  techniques  used  and  some  approaches  to  similar  problems  in  the 
literature  are  presented.  Section  3  has  a  description  of  the  proposed  algorithm. 
Section  4  details  the  real-world  biological  dataset  that  motivates  this  paper.  Sec¬ 
tion  5  includes  the  experimental  setup,  along  with  the  results  obtained,  and  an 
analysis.  Finally,  some  concluding  remarks  are  made  in  Section  6. 

2  Preliminaries 

This  section  is  divided  in  the  following  way:  In  Section  2.1  we  introduce  the 
notation  that  has  been  used  in  this  paper.  Then  we  include  a  brief  summary  of 
what  has  been  done  in  feature  extraction  in  Section  2.2,  and  a  short  review  of 
the  different  approaches  we  found  in  the  specialized  literature  on  the  use  of  GP 
for  feature  extraction  in  Section  2.3. 

2.1  Notation 

When  describing  the  problem,  datasets  A,  B  and  S  correspond  to: 

—  A:  The  original  dataset,  provided  by  the  first  lab,  that  was  used  to  build  the 
classifier. 

—  B:  The  problem  dataset,  from  the  second  lab.  The  classifier  is  not  accurate 
on  this  dataset,  and  that  is  what  the  proposed  algorithm  attempts  to  solve. 

—  S:  The  solution  dataset,  result  of  applying  the  evolved  transformation  to  the 
samples  in  dataset  B.  The  goal  is  to  have  the  classifier  performance  be  as 
high  as  possible  on  this  dataset. 


On  the  Homogenization  of  Data  from  Two  Laboratories 


187 


2.2  Feature  Extraction 

Feature  extraction  is  one  form  of  pre-processing,  which  creates  new  features  as 
functional  mappings  of  the  old  ones.  An  early  proposer  of  such  a  term  was  proba¬ 
bly  Wyse  in  1980  [1],  in  a  paper  about  intrinsic  dimensionality  estimation.  There 
are  multiple  techniques  that  have  been  applied  to  feature  extraction  throughout 
the  years,  ranging  from  principal  component  analysis  (PCA)  to  support  vector 
machines  (SVMs)  to  GAs  (see  [2,3,4],  respectively,  for  some  examples). 

Among  the  foundations  papers  in  the  literature,  Liu’s  book  in  1998  [5]  is  one 
of  the  earlier  compilations  of  the  field.  A  workshop  held  in  2003  [6],  led  Guyon 
&  Elisseeff  to  publish  a  book  with  an  important  treatment  of  the  foundations  of 
feature  extraction[7j. 

2.3  Genetic  Programming-Based  Feature  Extraction 

Genetic  Programming  (GP)  has  been  used  extensively  to  optimize  feature  ex¬ 
traction  and  selection  tasks.  One  of  the  first  contributions  in  this  line  was  the 
work  published  by  Tackett  in  1993  [8],  who  applied  GP  to  feature  discovery  and 
image  discrimination  tasks. 

We  can  consider  two  main  branches  in  the  philosophy  of  GP-based  feature 
extraction: 

1  On  one  hand,  we  have  the  proposals  that  focus  only  on  the  feature  extraction 
procedure,  of  which  there  are  multiple  examples:  Sherrah  et  al.  [9]  presented 
in  1997  the  evolutionary  pre-processor  (EPrep),  which  searches  for  an  op¬ 
timal  feature  extractor  by  minimizing  the  misclassification  error  over  three 
randomly  selected  classifiers.  Kotani  et  al.’s  work  from  1999  [10]  determined 
the  optimal  polynomial  combinations  of  raw  features  to  pass  to  a  k-nearest 
neighbor  classifier.  In  2001,  Bot  [11]  evolved  transformed  features,  one-at-a- 
time,  again  for  a  k-NN  classifier,  utilizing  each  new  feature  only  if  it  improved 
the  overall  classification  performance.  Zhang  &  Rockett,  in  2006,  [12]  used 
multiobjective  GP  to  learn  optimal  feature  extraction  in  order  to  fold  the 
high-dimensional  pattern  vector  to  a  one-dimensional  decision  space  where 
the  classification  would  be  trivial.  Lastly,  also  in  2006,  Guo  &  Nandi  [13]  op¬ 
timized  a  modified  Fisher  discriminant  using  GP,  and  then  Zhang  &  Rockett 
[14]  extended  their  work  by  using  a  multiobjective  approach  to  prevent  tree 
bloat. 

2  On  the  other  hand,  some  authors  have  chosen  to  evolve  a  full  classifier  with 
an  embedded  feature  extraction  step.  As  an  example,  Harris  [15]  proposed  in 
1997  a  co-evolutionary  strategy  involving  the  simultaneous  evolution  of  the 
feature  extraction  procedure  along  with  a  classifier.  More  recently,  Smith  & 
Bull  [16]  developed  a  hybrid  feature  construction  and  selection  method  using 
GP  together  with  a  GA. 

2.4  Finding  and  Repairing  Fractures  between  Data 

Among  the  proposals  to  quantify  the  fracture  in  the  data,  we  would  like  to 
mention  the  one  by  Wang  et  al.  [17],  where  the  authors  present  the  idea  of 
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correspondence  tracing.  They  propose  an  algorithm  for  the  discovering  of  changes 
of  classification  characteristics,  which  is  based  on  the  comparison  between  two 
rule-based  classifiers,  one  built  from  each  dataset.  Yang  et  al.  [18]  presented  in 
2008  the  idea  of  conceptual  equivalence  as  a  method  for  contrast  mining,  which 
consists  of  the  discovery  of  discrepancies  between  datasets.  Lately,  it  is  important 
to  mention  the  work  by  Cieslak  and  Chawla  [19],  which  presents  a  statistical 
framework  to  analyze  changes  in  data  distribution  resulting  in  fractures  between 
the  data. 

The  fundamental  difference  between  the  mentioned  works  and  this  one  is  we 
focus  on  repairing  the  fracture  by  modifying  the  data,  using  a  general  method 
that  works  with  any  kind  of  data  fracture,  while  they  propose  methods  to  quan¬ 
tify  said  fracture  that  work  provided  some  conditions. 


3  A  Proposal  for  GP-Based  Feature  Extraction  to 
Homogenize  Data  from  Two  Laboratories 

The  problem  we  are  attempting  to  solve  is  the  design  of  a  method  that  can  create 
a  transformation  from  a  dataset  (dataset  B)  where  a  classification  model  built 
using  the  data  from  a  different  dataset  (dataset  A)  is  not  accurate;  into  a  new 
dataset  (dataset  S)  where  the  classifier  is  more  accurate.  Said  classifier  is  kept 
unchanged  throughout  the  process. 

We  decided  to  use  GP  to  solve  the  problem  for  a  number  of  reasons: 

1  It  is  well  suited  to  evolve  arbitrary  expressions  because  its  chromosomes  are 
trees.  This  is  useful  in  our  case  because  we  want  to  have  the  maximum  possi¬ 
ble  flexibility  in  terms  of  the  functional  expressions  of  this  transformations. 

2  GP  provides  highly-interpretable  solutions.  This  is  an  advantage  because  our 
goal  is  not  only  to  have  a  new  dataset  where  the  classifier  works,  but  also  to 
analyze  what  was  the  problem  in  the  first  dataset. 

Once  GP  was  chosen,  we  needed  to  decide  what  terminals  and  operators  to  use, 
how  to  calculate  the  fitness  of  an  individual  and  which  evolutionary  parameters 
(population  size,  number  of  generations,  selection  and  mutation  rates,  etc)  are 
appropriate  for  the  problem  at  hand. 

3.1  Solutions  Representation:  Context-Free  Grammar 

The  representation  of  the  solutions  was  achieved  by  extending  GP  to  evolve 
more  than  one  tree  per  solution.  Each  individual  is  composed  by  n  trees,  where 
n  is  the  number  of  attributes  present  in  the  dataset.  We  are  trying  to  develop  a 
new  dataset  with  the  same  number  of  attributes  as  the  old  one,  since  this  new 
dataset  needs  to  be  fed  to  the  existing  model.  In  the  tree  structure,  the  leaves 
are  either  constants  (we  use  the  Ephemeral  Random  Constant  approach  [20])  or 
attributes  from  the  original  dataset.  The  intermediate  nodes  are  functions  from 
the  function  set,  which  is  specific  to  each  problem. 
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The  attributes  on  the  transformed  dataset  are  represented  by  algebraic  expres¬ 
sions.  These  expressions  are  generated  according  to  the  rules  of  a  context-free 
grammar  which  allows  the  absence  of  some  of  the  functions  or  terminals.  The 
grammar  corresponding  to  the  example  problem  would  look  like  this: 

Start  — >  Tree  Tree 
Tree  — >  Node 

Node  — >  Node  Operator  Node 
Node  — >  Terminal 
Operator  — s >  +  |  —  |  *  |  4- 
Terminal  — >  Xo  |  x\  \  E 
E  — >  real  Number  {represented  by  e) 


3.2  Fitness  Evaluation 

The  fitness  evaluation  procedure  is  probably  the  most  treated  aspect  of  design 
in  the  literature  when  dealing  with  GP-based  feature  extraction.  As  has  been 
stated  before,  the  idea  is  to  have  the  provided  classifier’s  performance  drive 
the  evolution.  To  achieve  that,  our  method  calculates  fitness  as  the  classifier’s 
accuracy  over  the  dataset  obtained  by  applying  the  transformations  encoded  in 
the  individual  (training-set  accuracy). 

3.3  Genetic  Operators 

This  section  details  the  choices  made  for  selection,  crossover  and  mutation  op¬ 
erators.  Since  the  objective  of  this  work  is  not  to  squeeze  the  maximum  possible 
performance  from  GP,  but  rather  to  show  that  it  is  an  appropriate  technique  for 
the  problem  and  that  it  can  indeed  solve  it,  we  did  not  pay  special  attention  to 
these  choices,  and  picked  the  most  common  ones  in  the  specialized  literature. 

—  Tournament  selection  without  replacement.  To  perform  this  selection,  s  in¬ 
dividuals  are  first  randomly  picked  from  the  population  (where  s  is  the  tour¬ 
nament  size),  while  avoiding  using  any  member  of  the  population  more  than 
once.  The  selected  individual  is  then  chosen  as  the  one  with  the  best  fitness 
among  those  picked  in  the  first  stage. 

—  One-point  crossover:  A  subtree  from  one  of  the  parents  is  substituted  by  one 
from  the  other  parent.  This  procedure  is  carried  over  in  the  following  way: 

1  Randomly  select  a  non-root  non-leave  node  on  each  of  the  two  parents. 

2  The  first  child  is  the  result  of  swapping  the  subtree  below  the  selected 
node  in  the  father  for  that  of  the  mother. 

3  The  second  child  is  the  result  of  swapping  the  subtree  below  the  selected 
node  in  the  mother  for  that  of  the  father. 
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—  Swap  mutation:  This  is  a  conservative  mutation  operator,  that  helps  diversify 
the  search  within  a  close  neighborhood  of  a  given  solution.  It  consists  of 
exchanging  the  primitive  associated  to  a  node  by  one  that  has  the  same 
number  of  arguments. 

—  Replacement  mutation:  This  is  a  more  aggressive  mutation  operator  that 
leads  to  diversification  in  a  larger  neighborhood.  The  procedure  to  perform 
this  mutation  is  the  following: 

1  Randomly  select  a  non-root  non-leave  node  on  the  tree  to  mutate. 

2  Create  a  random  tree  of  depth  no  more  than  a  fixed  maximum  depth. 
In  this  work,  the  maximum  depth  allowed  was  5. 

3  Swap  the  subtree  below  the  selected  node  for  the  randomly  generated 
one. 

3.4  Function  Set 

Which  functions  to  include  in  the  function  set  are  usually  dependent  on  the 
problem.  Since  one  of  our  goals  is  to  have  an  algorithm  as  universal  and  ro¬ 
bust  as  possible,  where  the  user  does  not  need  to  fine-tune  any  parameters  to 
achieve  good  performance;  we  decided  not  to  study  the  effect  of  different  function 
set  choices.  We  chose  the  default  functions  most  authors  use  in  the  literature: 
cos}. 

3.5  Parameters 

Table  1  summarizes  the  parameters  used  for  the  experiments. 

Table  1.  Evolutionary  parameters  for  a  nv- dimensional  problem 


Parameter 

Value 

Number  of  trees 

nv 

Population  size 

400  *  nv 

Duration  of  the  run 

100  generations 

Selection  operator 

Tournament  without  replacement 

Tournament  size 

log2(nv)  +  1 

Crossover  operator 

One-point  crossover 

Crossover  probability 

0.9 

Mutation  operator 

Replacement  &  Swap  mutations 

Replacement  mutation  probability 

0.001 

Swap  mutation  probability 

0.01 

Maximum  depth  of  the  swapped  in  subtree 

5 

Function  set 

4-,  cos,  exp} 

Terminal  set 

{x0,Xl,...,Xnv  -  1,  e} 

3.6  Execution  Flow 

Algorithm  1  contains  a  summary  of  the  execution  flow  of  the  GP  procedure, 
which  follows  a  classical  evolutionary  scheme.  It  stops  after  a  user-defined  num¬ 
ber  of  generations, 
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Algorithm  1.  Execution  flow  of  the  GP  method 

1.  Randomly  create  the  initial  population  by  applying  the 
context-free  grammar  in  Section  3.1. 

2.  Repeat  Ng  times  (where  Ng  is  the  number  of  generations) 

2.1  Evaluate  the  current  population,  using  the  procedure 

seen  in  Section  3.2. 

2.2  Apply  selection  and  crossover  to  create  a  new 

population  that  will  replace  the  old  one. 

2.3  Apply  the  mutation  operators  to  the  new  population. 

3.  Return  the  best  individual  ever  seen. 


4  Case  Study:  Prostate  Cancer  Diagnosis 

Prostate  cancer  is  the  most  common  non-skin  malignancy  in  the  western  world. 
The  American  Cancer  Society  estimated  192,280  new  cases  and  27,360  deaths 
related  to  prostate  cancer  in  2009  [21].  Recognizing  the  public  health  implications 
of  this  disease,  men  are  actively  screened  through  digital  rectal  examinations 
and/or  serum  prostate  specific  antigen  (PSA)  level  testing.  If  these  screening 
tests  are  suspicious,  prostate  tissue  is  extracted,  or  biopsied,  from  the  patient 
and  examined  for  structural  alterations.  Due  to  imperfect  screening  technologies 
and  repeated  examinations,  it  is  estimated  that  more  than  one  million  people 
undergo  biopsies  in  the  US  alone. 

4.1  Diagnostic  Procedure 

Biopsy,  followed  by  manual  examination  under  a  microscope  is  the  primary 
means  to  definitively  diagnose  prostate  cancer  as  well  as  most  internal  cancers 
in  the  human  body.  Pathologists  are  trained  to  recognize  patterns  of  disease  in 
the  architecture  of  tissue,  local  structural  morphology  and  alterations  in  cell  size 
and  shape.  Specific  patterns  of  specific  cell  types  distinguish  cancerous  and  non- 
cancerous  tissues.  Hence,  the  primary  task  of  the  pathologist  examining  tissue 
for  cancer  is  to  locate  foci  of  the  cell  of  interest  and  examine  them  for  alterations 
indicative  of  disease.  A  detailed  explanation  of  the  procedure  is  beyond  the  scope 
of  this  paper  and  can  be  found  elsewhere  [22,23,24,25]. 

Operator  fatigue  is  well-documented  and  guidelines  limit  the  workload  and 
rate  of  examination  of  samples  by  a  single  operator  (examination  speed  and 
throughput).  Importantly,  inter-  and  intra-pathologist  variation  complicates  de¬ 
cision  making.  For  this  reason,  it  would  be  extremely  interesting  to  have  an 
accurate  automatic  classifier  to  help  reduce  the  load  on  the  pathologists.  This 
was  partially  achieved  in  [24],  but  some  issues  remain  open. 

4.2  The  Generalization  Problem 

Llora  et  al.  [24]  successfully  applied  a  genetics-based  approach  to  the  develop¬ 
ment  of  a  classifier  that  obtained  human-competitive  results  based  on  FTIR 
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data.  However,  the  classifier  built  from  the  data  obtained  from  one  laboratory 
proved  remarkably  inaccurate  when  applied  to  classify  data  from  a  different 
hospital.  Since  all  the  experimental  procedure  was  identical;  using  the  same  ma¬ 
chine,  measuring  and  post-processing;  and  having  the  exact  same  lab  protocols, 
both  for  tissue  extraction  and  staining;  there  was  no  factor  that  could  explain 
this  discrepancy. 

What  we  attempt  to  do  with  this  work  is  develop  an  algorithm  that  can 
evolve  a  transformation  over  the  data  from  the  second  laboratory,  creating  a  new 
dataset  where  the  classifier  built  from  the  first  lab  is  as  accurate  as  possible. 

4.3  Pre-processing  of  the  Data 

The  biological  data  obtained  from  the  laboratories  has  an  enormous  size  (in  the 
range  of  14GB  of  storage  per  sample);  and  parallel  computing  was  needed  to 
achieve  better-than-human  results.  For  this  reason,  feature  selection  was  per¬ 
formed  on  the  dataset  obtained  by  FTIR.  It  was  done  by  applying  an  evalu¬ 
ation  of  pairwise  error  and  incremental  increase  in  classification  accuracy  for 
every  class,  resulting  in  a  subset  of  93  attributes.  This  reduced  dataset  provided 
enough  information  for  classifier  performance  to  be  rather  satisfactory:  a  sim¬ 
ple  C4.5  classifier  achieved  ~  95%  accuracy  on  the  data  from  the  first  lab,  but 
only  ~  80%  on  the  second  one.  The  dataset  consists  of  789  samples  from  one 
laboratory  and  665  from  the  other  one.  These  samples  represent  0.01%  of  the 
total  data  available  for  each  data  set,  which  were  selected  applying  stratified 
sampling  without  replacement.  A  detailed  description  of  the  data  pre-processing 
procedure  can  be  found  in  [22]. 

The  experiments  reported  in  this  paper  were  performed  utilizing  the  reduced 
dataset,  since  the  associated  computational  costs  make  it  unfeasible  to  work 
with  the  complete  one.  The  reduced  dataset  is  made  of  93  real  attributes,  and 
there  are  two  classes  (positive  and  negative  diagnosis).  The  dataset  consists  of 
789  samples  from  one  laboratory  and  665  from  the  other  one,  with  a  60%  —  40% 
class  distribution. 

5  Experimental  Study 

This  section  is  organized  in  the  following  way:  To  begin  with,  a  general  de¬ 
scription  of  the  experimental  procedure  is  presented  in  Section  5.1,  and  the 
parameters  used  for  the  experiment.  The  results  obtained  are  presented  in  Sec¬ 
tion  5.2,  a  statistical  analysis  is  shown  in  Section  5.3,  and  lastly  some  sample 
transformations  are  shown  in  Section  5.4. 

5.1  Experimental  Framework 

The  experimental  methodology  can  be  summarized  as  follows: 

1  Consider  each  of  the  provided  datasets  (one  from  each  lab)  to  be  datasets  A 
and  B  respectively. 
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2  From  dataset  A.  build  a  classifier.  We  chose  C4.5  [26],  but  any  other  classifier 
would  work  exactly  the  same;  due  to  the  fact  that  the  proposed  method  uses 
the  learned  classifier  as  a  black  box. 

3  Apply  our  method  to  dataset  B  in  order  to  evolve  a  transformation  that  will 
create  a  solution  dataset  S.  Use  5-fold  cross  validation  over  dataset  S,  so 
that  training  and  test  set  accuracy  results  can  be  obtained. 

4  Check  the  performance  of  the  step  2  classifier  on  dataset  S.  Ideally,  it  should 
be  close  to  the  one  on  dataset  A,  meaning  the  proposed  method  has  success¬ 
fully  discovered  the  hidden  transformation  and  inverted  it. 


5.2  Performance  Results 

This  section  presents  the  results  for  the  Prostate  Cancer  problem,  in  terms  of 
classifier  accuracy.  The  results  obtained  can  be  seen  in  table  2. 


Table  2.  Classifier  performance  results 


Classifier  performance  in  dataset  ... 

A-training 

A-test 

B 

S-training 

S-test 

0.95435 

0.92015 

0.83570 

0.95191 

0.92866 

The  performance  results  are  promising.  First  and  foremost,  the  proposed 
method  was  able  to  find  a  transformation  over  the  data  from  the  second  labora¬ 
tory  that  made  the  classifier  work  just  as  well  as  it  did  on  the  data  from  the  first 
lab,  effectively  finding  the  fracture  in  the  data  (that  is,  the  difference  in  data 
distribution  between  the  data  sets  provided  by  the  two  labs)  that  prevented  the 
classifier  from  working  accurately. 

5.3  Statistical  Analysis 

To  complete  the  experimental  study,  we  performed  a  statistical  comparison 
between  the  classifier  performance  over  datasets  A,  B  and  S. 

In  [27,28,29,30]  a  set  of  simple,  safe  and  robust  non-parametric  tests  for  statis¬ 
tical  comparisons  of  classifiers  are  recommended.  One  of  them  is  the  Wilcoxon 
Signed-Ranks  Test  [31,32],  which  is  the  test  that  we  have  selected  to  do  the 
comparison. 

In  order  to  perform  the  Wilcoxon  test,  we  used  the  results  from  each  parti¬ 
tion  in  the  5-fold  cross  validation  procedure.  We  ran  the  experiment  four  times, 
resulting  in  4  *  5  =  20  performance  samples  to  carry  out  the  statistical  test.  R+ 
corresponds  to  the  first  algorithm  in  the  comparison  winning,  R~  to  the  second 
one. 

We  can  conclude  our  method  has  proved  to  be  capable  of  fully  homogenizing 
the  data  from  both  laboratories  regarding  classifier  performance,  both  in  terms 
of  training  and  test  set. 
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Table  3.  Wilcoxon  signed-ranks  test  results 


Comparison 

R+ 

R~ 

p- value 

null  hypothesis  of  equality 

A-test  vs  B 

210 

0 

1.91  E  -  007 

rejected  (A-test  outperforms  B) 

B  vs  S-test 

0 

210 

1.91  E  -  007 

rejected  (S-test  outperforms  B) 

A-training  vs  S-training 

126 

84 

— 

accepted 

A-test  vs  S-test 

84 

126 

— 

accepted 

5.4  Obtained  Transformations 

Figure  1  contains  a  sample  of  some  of  the  evolved  expressions  for  the  best  indi¬ 
vidual  found  by  our  method.  Since  the  dataset  has  93  attributes,  the  individual 
was  composed  of  93  trees,  but  for  space  concerns  only  the  attributes  relevant  to 
the  C4.5  classifier  were  included  here. 
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Fig.  1.  Tree  representation  of  the  expressions  contained  in  a  solution  to  the  Prostate 
Cancer  problem 


6  Concluding  Remarks 

We  have  presented  a  new  algorithm  that  approaches  a  common  problem  in  real 
life  for  which  not  many  solutions  have  been  proposed  in  evolutionary  computing. 
The  problem  in  question  is  the  repairing  of  fractures  between  data  by  adjusting 
the  data  itself,  not  the  classifiers  built  from  it. 
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We  have  developed  a  solution  to  the  problem  by  means  of  a  GP-based  al¬ 
gorithm  that  performs  feature  extraction  on  the  problem  dataset  driven  by  the 
accuracy  of  the  previously  built  classifier. 

We  have  applied  our  method  to  a  real-world  problem  where  data  from  two  dif¬ 
ferent  laboratories  regarding  prostate  cancer  diagnosis  was  provided,  and  where 
the  classifier  learned  from  one  did  not  perform  well  enough  on  the  other.  Our 
algorithm  was  capable  of  learning  a  transformation  over  the  second  dataset  that 
made  the  classifier  fit  just  as  well  as  it  did  on  the  first  one.  The  validation  results 
with  5-fold  cross  validation  also  support  the  idea  that  the  algorithm  is  obtaining 
good  results;  and  has  a  strong  generalization  power. 

We  have  applied  a  statistical  analysis  methodology  that  supports  the  claim 
that  the  classifier  performance  obtained  on  the  solution  dataset  significantly 
outperforms  the  one  obtained  on  the  problem  dataset. 

Lastly,  we  have  shown  the  learned  transformations.  Unfortunately,  we  have 
not  been  able  to  extract  any  useful  information  from  them  yet. 
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ABSTRACT 

Glandular  tumors  arising  in  epithelial  cells  comprise  the  majority  of  solid  human  cancers. 
Glands  are  supported  by  stroma,  which  is  activated  in  the  proximity  of  a  tumor.  Activated 
stroma  is  often  characterized  by  the  molecular  expression  of  a-smooth  muscle  actin  (SMA) 
within  fibroblasts.  The  precise  spatial  and  temporal  evolution  of  chemical  changes  in 
fibroblasts  upon  epithelial  tumor  signaling,  however,  is  poorly  understood.  Here  we  report  a 
label-free  method  to  characterize  fibroblast  changes  using  Fourier  transfonn  infrared  (FT-IR) 
spectroscopic  imaging  by  comparing  spectra  to  a-SMA  expression  in  primary  normal  human 
fibroblasts.  The  fibroblast  activation  process  was  recorded  by  spectroscopic  imaging  using 
increasingly  tissue-like  conditions  -  (a)  simulation  using  the  growth  factor  TGFpi,  (b)  co¬ 
culture  with  MCF-7  human  breast  cancerous  epithelial  cells  in  Transwell  co-culture  and,  (c) 
with  MCF-7  in  three-dimensional  cell  culture.  Spectral  signatures  of  stromal  transformation 
were  finally  compared  to  normal  and  malignant  human  breast  tissue  biopsies.  Results 
indicate  that  temporally  complex  spectral  changes  are  observed,  providing  a  richer 
assessment  than  simple  molecular  imaging  based  on  a-SMA  expression.  Some  changes  are 
conserved  across  culture  conditions  and  in  human  tissue,  providing  a  label-free  method  to 
monitor  stromal  transformations. 
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INTRODUCTION 


The  stroma  is  known  to  play  a  crucial  role  in  epithelial  cancer  progression  in  a  variety  of 
tissues  (1-4).  The  stroma  has  also  been  suggested  as  an  alternative  and  potentially  more 
effective  therapeutic  target  because  the  vast  heterogeneity  in  the  genomic  and  histological 
makeup  of  epithelial  tumors  makes  individualized  treatment  expensive  and  unreliable  (5). 
Methods  to  characterize  the  stroma,  hence,  and  its  transformations  in  epithelial  tumor 
progression  are  imperative.  One  hallmark  of  a  cancer-associated  stroma,  for  example,  is  the 
fibroblast-to-myofibroblast  cellular  transformation  (6).  This  phenotypic  change  is 
characterized  by  the  expression  of  a-smooth  muscle  actin  (a-SMA),  a  cytoplasmic  protein 
that  increases  the  cell’s  contractility  and  leads  to  the  stiffening  of  the  tumor 
microenvironment  (7).  The  fibroblast-to-myofibroblast  transformation  has  been  observed 
within  tumor-adjacent  stroma  in  human  tissues  (8-10).  A  similar  response  can  be  induced  by 
exposing  fibroblasts  to  elevated  levels  of  transforming  growth  factor-pi  (TGF-pi)  in  cell 
culture  (11).  Because  of  the  readily-observable  transition  and  its  effect  on  physical  properties 
of  the  tissue,  stromal  myofibroblasts  have  been  a  focus  of  research  and  are  important  markers 
in  glandular  cancers  such  as  breast  cancer  (10,11).  Immunohistochemistry  (IHC)  is  the  gold 
standard  for  visualizing  a-SMA  expression  in  clinical  samples  but  using  antibody-based 
techniques  is  time-consuming,  costly  and  quantifying  protein  expression  is  difficult  (12).  The 
stromal  response,  further,  is  likely  more  complex  than  characterized  by  this  single  marker. 
Though  advances  in  immunofluorescence  have  made  considerable  progress  (13),  only  a  few 
known  proteins  can  simultaneously  be  detected.  Even  this  capability  may  not  be  sufficient  to 
catalogue  the  varied  cytopathic  effects  of  a  multifactorial  disease  like  cancer.  Alternative 
techniques  to  directly  measure  cellular  transformations  in  a  consistent,  quantitative  and 
multiplexed  manner  are  needed. 

As  an  alternative  to  molecular  imaging,  label-free  chemical  imaging  approaches  have 
recently  provided  reliable  correlations  between  histopathologic  status  and  spectral  markers 
(14-16).  Fourier  transfonn  infrared  (FT-IR)  spectroscopic  imaging,  in  particular,  has  been 
used  extensively  to  study  biochemical  changes  within  cells  as  well  as  differences  between 
cell  lines  (17-19).  Molecular  expression  in  simple  breast  cell  cultures,  too,  has  been 
correlated  to  spectral  properties  in  both  IR  (20)  and  Raman  spectroscopy  (21).  These  studies 
have  focused  on  epithelial  cells.  The  fibroblast  response  to  epithelial  transformations  has  not 
been  studied  in  vitro  using  spectroscopic  imaging  techniques.  Here,  we  describe  a  method  for 
characterizing  and  analyzing  the  fibroblast  to  myofibroblast  differentiation.  We  specifically 
seek  to  examine  the  correlation  between  the  current  gold  standard  antibody  marker  and  the 
spectroscopic  signature  of  transformation.  We  examine  transformation  in  primary  normal 
dermal  fibroblasts  activated  with  TGFpi,  in  co-cultures  of  primary  fibroblasts  with 
tumorigenic  breast  epithelial  cells  (MCF-7)  and  human  tissues.  While  co-culture  models 
provide  a  tissue-like  environment,  TGFfll  activation  is  used  as  a  positive  control  because  it  is 
commonly  used  in  research  (7).  This  comprehensive  examination  of  cells  grown  in  two- 
dimensional  (2D)  and  three-dimensional  (3D)  cell  cultures  as  well  as  human  breast  tissue 
will  ensure  wide  research  and  clinical  relevance.  Finally,  the  presence  of  other  cell  types  is  a 
potentially  confounding  analytical  factor  and  it  is  not  obvious  that  spectral  correlations  will 
hold  for  mixtures  of  cell  types.  Hence,  this  study  is  important  from  the  perspective  of  clinical 
cancer  progression,  for  research  in  correlating  labeled  and  label-free  approaches  as  well  as  in 
the  analysis  of  samples  that  present  a  complex  bioanalytical  background. 


METHODS 


Experimental  Design 
Cell  Culture  Models 

To  observe  the  effects  of  cancerous  breast  epithelium  on  the  surrounding  tissue  stroma,  two 
co-culture  methods  were  utilized.  First,  the  Transwell  co-culture  (Figure  1A)  allows  for  two 
cell  types  to  communicate  via  soluble  growth  factors  that  diffuse  into  a  shared  medium  (22). 
Second,  a  3D  cellular  co-culture  model  (Figure  IB)  was  developed.  The  3D  model  consists 
of  cells  embedded  in  a  type  I  collagen  hydrogel.  While  both  the  systems  are  essentially 
mediated  by  soluble  growth  factors,  cells  adhere  to  a  solid  substrate  in  the  2D  model  and 
have  a  different  geometry  and  physical  microenvironment.  While  2D  monolayer  cultures  are 
the  staple  of  cell  biology,  3D  cultures  have  recently  been  shown  to  be  a  more  realistic 
representation  of  biological  phenomena  that  occur  in  tissue  (23-28).  Hence,  an  analysis  of 
both  systems  using  our  approach  serves  to  ensure  that  the  developed  method  is  robust  and 
relevant  to  different  communities  of  researchers.  In  addition  to  these  co-cultures,  we  sought 
to  demonstrate  that  the  developed  methods  are  also  valid  for  2D  and  3D  cultures  of  single 
cell  types.  Therefore,  we  stimulated  single  cell  type  cultures  with  TGF[31  to  validate  the 
observed  activation. 

For  2D  cultures,  primary  normal  human  dermal  fibroblasts  (NDF)  were  grown  on  MirrIR 
slides,  which  allowed  for  both  FT-IR  transflectance  and  immunofluorescence  imaging.  The 
fibroblasts  were  co-cultured  with  cancerous  breast  epithelial  cells,  MCF-7,  or  stimulated  with 
TGFpi.  MCF-7  cells  were  derived  from  a  human  breast  tumor  that  had  metastasized  to  the 

lung,  but  maintain  a  cancer  stem  cell-like  phenotype  in  culture  (29).  They  are  less  aggressive 
when  injected  into  nude  mice  compared  with  other  human  breast  cancer  cell  lines  and,  hence, 

were  used  in  this  study  as  a  model  of  an  ‘early’  cancerous  source.  Samples  were  removed 
from  the  culture  at  specific  time  points  (Oh,  6h,  12h,  24h)  and  fixed.  Half  the  number  of 
samples  were  analyzed  using  immunofluorescence  to  detect  a-SMA.  The  other  half  were 
spectroscopically  imaged.  For  3D  cultures,  samples  were  prepared  as  separate  layers,  with 
one  cell  type  (NDF,  MCF-7)  per  layer  in  a  type  I  collagen  matrix.  The  layers  are  co-cultured 
for  a  determined  length  of  time,  and  then  separated  with  forceps.  There  was  no  observed  cell 
migration  within  the  time  intervals  of  this  experiment,  determined  by  cell  type  specific 
expression  of  cytokeratins  for  epithelial  cells  and  vimentin  for  fibroblasts  (Data  not  shown). 
Briefly,  the  layers  were  separated,  stained  using  standard  IHC  methods,  and  subsequently 
imaged  using  a  Zeiss  Axiovert  200M.  A  similar  3D  model  has  been  demonstrated  previously 
to  study  skin  cancer  (29).  While  the  experimental  methods  of  both  this  study  and  the 
engineered  skin  model  are  capable  of  studying  epithelial-fibroblast  interactions,  the  pre¬ 
defined  geometry  here  allows  observations  of  molecular  changes  without  morphology- 
associated  effects  or  changing  molecular  concentration  of  a  growing  tumor  that  may 
confound  the  temporal  profile. 

Cell  Culture 

Cell  lines  and  Use.  Normal  adult  primary  dermal  fibroblasts  (NHDF,  Lonza,  #CC-2511) 
were  maintained  in  Fibroblast  Basal  Medium  supplemented  with  0.1%  hFGF-B,  0.1% 
Insulin,  0.1%  gentamicin/amphotericin-B,  and  2%  fetal  bovine  serum  (FBS)  (FGM-2 
Fibroblast  Growth  Medium-2  Bullet  Kit,  Lonza,  #CC3132).  They  were  used  at  passage  8-10 


to  avoid  problems  associated  with  senescence  in  primary  cell  lines.  The  fibroblasts  were 
subcultured  according  to  protocols  detailed  on  the  Lonza  website,  and  their  ReagentPack 
(Trypsin/EDTA,  Trypsin  Neutralizing  Solution,  HEPES  Buffered  Saline  Solution,  #CC- 
5034)  was  used  exclusively  with  this  cell  type.  For  serum-free  medium,  the  media  was 
prepared  the  same  manner,  except  that  FBS  was  omitted.  MCF-7  (ATCC)  cells  were 
maintained  in  Dulbecco’s  Modified  Eagle’s  Essential  Medium  (Invitrogen)  supplemented 
with  10%  FBS  (Sigma)  and  1%  PenStrep  (Sigma).  They  were  subcultured  according  to 
ATCC  protocols  every  3  days  at  70%  confluency. 

2D  cell  culture.  Fibroblasts  were  grown  on  sterilized  MirrIR  slides  (Kevley  Technologies, 
Chesterland,  Ohio,  USA).  They  were  seeded  at  approximately  60%  confluency  and  grown 
for  24  hours  before  being  switched  to  serum-free  medium  (FGM-2,  Lonza,  with  additives  but 
not  FBS).  The  samples  were  grown  in  serum-free  medium  for  24  hours  before  co-culture. 
MCF-7  were  grown  on  transwell  inserts  (Corning,  0.1  pm  pore  size)  in  normal  growth 
medium  for  24  hours  and  then  switched  to  serum-free  medium  for  an  additional  24  hours 
before  co-culture. 

Transwell  co-culture.  The  Transwell  co-culture  system  is  useful  for  spectroscopy,  because 
any  IR  substrate  can  be  used  in  the  lower  chamber  of  the  culture  dish  (Transwell  inserts,  0.1 
pm  pore,  PES,  Corning  Incorporated,  Corning,  NY,  USA).  Here,  pieces  of  MirrIR  Low-E 
slides  were  sterilized  once  with  10%  bleach  followed  by  70%  ethanol  and  left  to  dry  in  a 
sterile  biosafety  cabinet  before  use.  Immunofluorescence  staining  was  also  performed  using 
the  MirrIR  slides,  and  there  were  no  detrimental  effects  on  the  coated  glass  surface.  Using 
the  MirrIR  slides  for  both  immunofluorescence  and  FT-IR  measurements  ensured  that  there 
was  no  substrate-specific  factor  that  could  have  induced  a-SMA  expression  independent  of 
soluble  growth  factors.  After  Oh,  6h,  12h,  and  24h  of  co-culture,  each  MirrIR  slide  was  rinsed 
with  sterile  IX  PBS  before  fixation  in  4%  paraformaldehyde  for  1  hour  at  4°C.  After  fixation, 
paraformaldehyde  was  neutralized  with  0.1  M  glycine  for  10  minutes.  Subsequently,  the 
samples  were  divided:  for  each  time  point,  two  samples  were  prepared  for  FT-IR  imaging 
and  two  were  prepared  for  immunofluorescence  staining.  The  samples  for  FT-IR  imaging 
were  rinsed  with  de-ionized  water  and  left  to  dry  prior  to  imaging. 

3D  cell  culture.  Cells  were  maintained  as  previously  described  in  two-dimensional  culture 
before  being  suspended  in  collagen  hydrogels  (Type  I  derived  from  rat  tail,  BD  Biosciences). 
All  reagents  were  kept  on  ice  before  plating  because  collagen  solution  will  gel  slightly  at 
room  temperature.  In  a  conical  tube  on  ice,  collagen  stock  solution  was  diluted  to  2  mg/mL 
with  sterile  10X  PBS.  Cells  were  trypsinized,  centrifuged  at  1000  rpm  for  3  minutes,  and 
resuspended  in  growth  medium.  After  counting,  cells  were  suspended  in  the  collagen  solution 
at  a  cell  density  of  420  cells/mL  for  NDF  and  1.9xl04  cells/mL  MCF-7.  A  much  lower  cell 
density  of  fibroblasts  was  used  compared  with  epithelial  cells  because  of  the  tendency  of 
fibroblasts  to  collapse  the  hydrogel  at  high  cell  density  and  after  activation  and  the  similarity 
to  fibroblast  density  in  real  tissue.  Finally,  IN  NaOH  was  added  at  0.023  mL  per  1  mL  of 
collagen  stock  solution  to  neutralize  the  acetic  acid  and  allow  the  collagen  to  gel.  To  prepare 
samples,  200  pL  of  collagen  and  cell  suspension  was  added  to  each  well  of  a  48-well  tissue 
culture  plate.  The  plates  were  left  at  4°C  for  90  minutes  to  slow  down  the  polymerization  of 
collagen  to  provide  a  more  uniform  fiber  orientation  and  width  (31).  The  samples  were  then 


placed  in  a  humidified  incubator  at  37°C  for  30  minutes  to  polymerize  the  collagen  with  cells 
embedded  within.  After  the  samples  had  gelled,  growth  medium  was  added.  The  cells  were 
allowed  to  grow  for  24  hours  before  being  changed  to  serum-free  medium  to  avoid  any 
confounding  effects  of  growth  factors  present  in  FBS.  After  48  hours  in  serum-free  medium, 
the  co-culture  layers  were  stacked  together  and  1.5  ng/mL  TGFpi  (Transfonning  Growth 
Factor-pi  from  human  platelets,  >  97%,  Sigma,  #T1654)  in  serum-free  medium  was  added  to 
the  appropriate  fibroblast  samples  as  a  positive  control.  Fresh  serum-free  medium  was  added 
to  the  co-cultured  samples.  After  0,  6,  12,  and  24  hours  of  co-culture,  the  fibroblast  layer  was 
fixed  in  4%  paraformaldehyde  overnight  before  processing  for  immunofluorescence  or  FT-IR 
imaging. 

Three-dimensional  culture  sample  preparation.  3D  culture  samples  were  paraffin  embedded 
and  sectioned  prior  to  imaging.  First,  the  paraformaldehyde  was  gently  aspirated  from  the 
samples  and  then  the  gels  were  dehydrated  by  serial  ethanol  dehydration.  The  samples  were 
put  in  50%,  70%,  80%,  and  95%  ethanol  for  45  minutes  each  followed  by  three  45  minute 
incubations  in  100%  ethanol.  The  samples  were  then  soaked  in  xylenes  for  three  45  minute 
periods.  Finally,  the  samples  were  placed  in  paraffin  in  a  60 °C  oven  for  two  1  hour  periods 
and  one  12  hour  period.  The  samples  were  mounted  in  paraffin  blocks  and  sectioned  at  5  pm 
onto  MirrIR  slides  for  FT-IR  imaging.  Samples  were  de-paraffinized  in  hexanes  for  24  hours 
before  imaging.  For  each  set  of  experiments,  samples  were  prepared  in  duplicate  and  the 
experiment  was  replicated  independently  to  show  reproducibility  of  both  biological  results 
and  absorbance  spectra. 

Immunofluorescence  Staining.  For  immunofluorescence  staining,  samples  were 
penneabilized  in  0.2%  TX-100  for  15  minutes.  After  washing  three  times  with  PBS  the 
samples  were  blocked  with  a  lwt%  BSA  in  PBS/T  for  1.5  hours.  After  three  washes  with 
PBS/T,  the  samples  were  incubated  with  primary  antibody  (Mouse  anti-human  a-SMA, 
Dako,  1:100  dilution)  overnight  at  4°C.  The  samples  were  washed  again  and  incubated  with 
secondary  antibody  (Goat  anti-Mouse  IgG-FITC  conjugated,  abeam,  1:80  dilution)  for  one 
hour.  The  samples  were  mounted  with  UltraCruz  Mounting  Medium  for  Fluorescence  with 
DAPI  (Santa  Cruz  Biotechnology,  Cat  #  sc-24941)  and  imaged  using  a  Zeiss  Axiovert  200M 
fluorescence  microscope.  For  three-dimensional  samples,  confocal  imaging  was  done  using  a 
Leica  SP2  laser  scanning  confocal  microscope. 

Immunohistochemistry:  Tissue  Biopsies.  A  tissue  microarray  (TMA)  of  96  1.5nun  human 
breast  tissue  cores  comprising  of  normal,  epithelial  hyperplasia,  in-situ,  benign  tumors  and 
malignant  cancer  tissues  was  obtained  (US  Biomax,  Inc.  USA.  #BR961).  Four  serial  sections 
were  acquired  from  the  TMA  block,  one  5pm  thick  tissue  section  was  placed  on  a  BaFi 
substrate  for  FT-IR  analyses  and  three  5pm  thick  tissues  sections  were  placed  on  standard 
glass  slides  for  IHC  and  hematoxylin  and  eosin  (H&E)  staining.  IFIC  staining  was  performed 
for  vimentin  and  a-SMA.  H&E  staining  was  used  for  tissue  visualization.  Staining  was 
performed  using  a  Ventana  Benchmark  XT  Automated  Slide  Preparation  system  (Ventana 
Medical  Systems,  Inc.)  and  Ventana  clinical  protocols  and  reagents  (XT  UltraView  DAB 
protocol,  Ventana,  Tucson,  AZ). 

FT-IR  spectroscopic  imaging.  FT-IR  spectroscopic  imaging  data  were  recorded  using  a 
Perkin  Elmer  Spotlight  400  imaging  system.  For  all  cellular  samples,  both  confluent  and 


sparse  regions  of  the  sample  were  imaged  in  a  trans flection  mode  and  data  from  4000  cm'1  to 
750  cm'1  were  saved.  A  spectral  resolution  of  8  cm'1  was  set  with  32  scans  per  pixel  averaged 
to  provide  higher  signal  to  noise  ratio  data.  An  interferometer  speed  of  1.0  cm/s  was  used 
while  a  pixel  size  of  6.25  x  6.25  pm  was  used  for  detection  using  a  MCT  linear  array.  A 
spectral  background  was  collected  on  the  MirrIR  slide  using  the  same  parameters  but  with 
120  scans  per  pixel.  Atmospheric  correction  was  performed  on  the  Spotlight  instrument,  and 
the  files  were  exported  into  ENVI-IDL.  Images  were  baseline  corrected  and  only  those  pixels 
with  an  absorbance  greater  than  0.015  a.u.  for  the  peak  absorbance  at  1656  cm'1  (Amide  I) 
were  used  for  further  analysis.  Spectra  were  normalized  to  Amide  I  to  account  for  variances 
in  cell  density. 

For  tissue  microarray  data,  absorbance  was  stronger  and  we  sought  to  maintain  compatibility 
with  earlier  studies  on  the  parameters  used.  Sample  data  were  collected  using  the  same 
scanning  parameters  as  for  cell  culture  samples,  except  a  4  cm'1  resolution  with  2  scans  per 
pixel  and  a  mirror  speed  of  2.2  cm/s  were  used.  A  background  was  acquired  at  these 
parameters  with  120  scans  averaged.  A  threshold  absorbance  of  0.03  a.u.  for  the  1656  cm'1 
(Amide  I)  absorbance  peak  was  employed  to  detennine  pixels  to  be  included  in  the  analysis. 
Regions  of  Interest  (ROIs)  were  manually  marked  on  the  absorbance  images  corresponding 
to  regions  of  either  fibroblast  or  myofibroblast  cells.  Cell-type  assignations  were  made  based 
on  the  IFIC  staining  of  the  serial  tissue  sections;  fibroblasts  stained  positive  for  vimentin  and 
negative  for  a-SMA,  and  myofibroblasts  stained  positive  for  both  vimentin  and  a-SMA.  Over 
40,000  pixels  corresponding  to  fibroblasts  and  over  150,000  pixels  corresponding  to 
myofibroblasts  were  identified.  From  these  identified  pixels,  average  spectra  were  obtained 
for  fibroblast  and  myofibroblast  classes. 


RESULTS  AND  DISCUSSION 

The  well-characterized  fibroblast  activation  pathway  serves  as  a  model  system  to  benchmark 
spectral  (chemical)  changes  that  accompany  the  phenotypic  transformation.  The  transwell  co¬ 
culture  system  was  used  first  to  determine  whether  co-culturing  normal  primary  fibroblasts 
with  tumorigenic  breast  epithelial  cells  could  result  in  an  activated  phenotype,  as  shown 
previously  in  fibroblasts  isolated  from  stroma  surrounding  a  tumor  in  vivo  (7)  as  well  as  after 
induction  by  TGF(31  in  vitro  (9).  In  our  co-culture  with  MCF-7  cells,  phenotypic  changes 
were  induced  in  the  primary  dermal  fibroblasts  within  6  hours  to  the  same  extent  as  treatment 
with  1.5  ng/mL  TGFpi  (Figure  2).  The  experiment  was  repeated  for  both  cases  over  a  time 
course  of  24  hours,  with  timepoints  being  taken  at  0  (no  co-culture),  6,  12,  and  24  hours  to 
observe  any  potential  evolution  of  this  marker  over  time.  From  immunofluorescence  imaging 
results,  there  was  no  visible  change  in  the  number  of  cells  expressing  a-SMA  expression  over 
time.  No  digitally-assisted  methods  were  used  in  order  to  compare  intensity  levels  as 
quantitative  intensity  analysis  is  difficult  due  to  non-specific  fluorescence  and 
photobleaching.  Both  stimulation  with  TGFpi  and  co-culture  with  MCF-7  activated 
fibroblasts  within  6  hours. 

We  hypothesized  that  examining  the  temporal  evolution  of  IR  absorption  spectra  would  yield 
more  information  about  fibroblast  activation  than  the  “on-off’  information  derived  from 
immunofluorescence  expression  of  a  single  biomarker.  Spectra  measured  from  fibroblasts  are 


shown  in  Figure  3.  Changes  were  primarily  seen  in  the  biomolecular  fingerprint  region 
(1800-950  cm'1)  and  also  in  the  C-H  stretching  region  (3000-2875  cm'1).  In  the  fingerprint 
region,  larger  changes  were  seen  in  peaks  at  1080  cm'1  and  1224  cm'1  (Figure  3,  top).  These 
are  the  asymmetric  and  symmetric  vibrational  modes  of  the  phosphate  bond,  indicative  of 
changes  in  nucleic  acids.  There  is  an  increase  in  the  1080  cm'1  peak,  which  is  usually 
associated  with  the  symmetric  phosphate  stretching  of  DNA.  These  spectra  are  averaged  to 
cell  density,  and  the  cells  were  serum-starved  before  the  experiment  began,  and  so  there 
should  have  been  little  cellular  proliferation  over  the  24  hour  time  course.  Serum- starving  the 
cells  prior  to  co-culture  arrests  them  at  G0/G1,  and  this  should  minimize  spectral  differences 

in  1080  cm'1  that  can  be  attributed  to  cells  being  in  different  phases  of  the  cell  cycle  (32-34), 

The  increase  at  1080  cm'1  indicates  that  unless  there  is  an  increased  amount  of  DNA  present 
in  the  cells,  this  assignment  of  this  peak  to  the  phosphate  bond  of  DNA  alone  may  be 
uncertain.  If  we  account  for  the  total  amount  of  genetic  material  present  within  the  cell 
(RNA,  DNA,  and  associated  proteins),  this  could  provide  some  explanations  for  the  increases 
in  absorbance  at  this  peak.  The  spectral  changes  seen  could  be  due  to  an  increase  in  RNA, 
changes  in  chromatin  three-dimensional  configuration,  chromatin  sequestration,  or  an 
increase  in  the  size  of  the  nucleus.  Recently  reported  by  Whelton  et  al,  changes  in  the  1080 
cm'1  peak  are  attributed  to  a  transition  between  native  B-  and  A-like  forms  of  DNA  upon 

dehydration  of  intact  cells.  We  do  not  anticipate  that  changes  seen  in  these  experiments  are 

due  to  this  transition  because  all  samples  were  fixed  and  dried  completely  prior  to 

spectroscopic  imaging  (35). 

Between  the  two  treatments  (MCF-7  co-culture  and  TGFpi  stimulation),  there  was  similar 
molecular  expression  of  a-SMA,  but  differences  in  absorption  at  1080  cm'1.  In  the  TGFpi 
stimulated  samples,  the  6-  and  12-  hour  samples  had  an  increase  in  absorption  at  1080  cm'1 
compared  with  the  control,  but  after  24  hours  the  level  had  fallen  back  to  the  control  value. 
Interestingly,  at  1224  cm'1,  the  6-  and  24-hour  time  points  were  elevated  while  the  12-hour 
sample  had  lower  absorption  than  the  control.  This  discrepancy  could  be  the  result  of  the 
cells  only  being  stimulated  with  TGFpi  once  at  the  beginning  of  the  experiment.  Thus,  the  6 
hour  sample  would  have  a  sustained  level  of  TGFpi  in  the  medium  before  the  cells  were 
fixed,  whereas  in  the  24  hour  sample  the  concentration  of  TGFpi  present  in  the  medium  has 
decreased  because  it  has  already  been  metabolized  by  the  cells.  However,  in  the  samples  that 
were  co-cultured  with  MCF-7  cells,  there  was  a  unifonn  level  of  growth  factors  secreted  by 
the  epithelial  cells  into  the  shared  medium  throughout  the  time  course  of  the  experiment. 
Therefore,  we  believe  that  the  absorbance  at  1224  cm'1  may  be  used  as  a  marker  for  a 
sustained  fibroblast  response  to  molecular  signals  released  by  a  malignant  epithelium. 

In  the  C-H  stretching  region,  changes  were  seen  in  peaks  at  2850  cm'1,  2930  cm'1,  and  2960 
cm'1.  This  region  of  the  spectrum  is  correlated  with  proteins  and  also  the  carbonyl  chains  of 
fatty  acids  (31).  With  increasing  lengths  of  time  after  TGFpi  stimulation,  there  was  a  gradual 
increase  in  peak  height  across  all  peaks  in  this  region  (Figure  3B,  bottom).  In  contrast,  co¬ 
culture  with  MCF-7  cells  yielded  a  fibroblast  response  that  was  more  defined,  with  a  very 
rapid  increase  in  peak  height  at  2930  cm'1  after  just  6  hours  in  comparison  with  the  control 
(Figure  3A,  bottom).  Although  immunofluorescence  results  show  q-SMA  expression  in 
samples  stimulated  with  TGFPI  or  co-cultured  with  MCF-7  cells,  there  were  differences  in 

absorption  spectra  between  the  two  sets  of  samples,  permitting  a  more  in-depth  biochemical 


analysis  of  cellular  activation.  The  reasons  for  the  difference  in  kinetics  of  activation  likely 
stem  from  the  co-culture  providing  a  host  of  molecules  in  the  activation  pathway  via 
paracrine  signaling.  While  the  mechanisms  of  the  two  activations  are  likely  different,  this 
would  not  be  apparent  from  a  single  marker.  It  is  also  interesting  to  contrast  the  ability  of 
spectroscopy  to  measure  transient  behavior,  which  is  lacking  in  the  expression  of  q-SMA. 

Vibrational  spectroscopy,  of  course,  does  not  provide  specific  protein  expression  levels  in 

cells.  As  a  general  strategy  for  comprehensive  biomolecular  analysis,  hence,  the 

spectroscopic  data  can  be  used  to  inform  the  search  for  appropriate  molecular  markers  by 

providing  the  temporal  evolution  profiles.  Further,  there  may  be  cellular  and  sub-cellular 
spectral  heterogeneity  across  the  sample  upon  stimulation.  These  have  been  examined 
elsewhere  (36). 

In  contrast  to  the  2D  transwell  co-culture,  culturing  cells  in  a  3D  geometry  provides  an 
environment  that  is  closer  to  cellular  chemistries  in  vivo.  Cells  are  known  to  express  surface 
receptors  more  faithfully  in  three-dimensional  culture  and  are  also  more  likely  to  differentiate 
in  response  to  external  stimuli  (37-39).  In  the  3D  co-culture  system  described  here,  a  single 
cell  type  and  collagen  scaffold  was  used  fabricate  a  cylindrical-shaped  “layer”  (Figure  IB). 

Lavers  containing  different  cell  types  were  prepared  separately  and  subsequently  stacked  on 

top  of  each  other.  This  technique  allowed  for  cells  to  be  co-cultured  by  simple  stacking. 

Since  the  layers  are  only  weakly  adherent,  they  could  subsequently  be  mechanically  re¬ 

separated  for  analysis.  As  previously  used  in  2D  cell  culture,  immunofluorescence  staining 
for  a-SMA  was  used  to  probe  for  the  presence  of  myofibroblasts  in  3D  using  confocal 
microscopy  (Figure  2C).  The  immunofluorescence  results  remained  consistent  with  the 
transwell  co-culture  results;  exposure  to  MCF-7  cells  activated  fibroblasts  along  the  same 
time  course  as  TGFpi  exposure.  Another  use  for  three-dimensional  cell  culture  in  this  setup 
is  that  the  collagen  peaks  (1283  cm'1,  1236  cm'1,  and  1204  cm'1)  can  be  used  for  IR  spectral 
analysis — either  as  control  or  for  examination  of  microenvironmental  changes  associated 
with  a  growing  tumor.  These  collagen  peaks  are  diagnostically  useful  when  looking  at  whole 
tissue  sections  (15),  and  there  is  evidence  showing  that  changes  in  collagen  spectra  can  be 
detected  within  a  certain  distance  from  a  tumor  (30,40),  which  is  clinically  relevant  for 
cancer  pathology.  Flence,  we  examined  the  same  in  the  3D  co-culture  model  (Figure  4A). 

The  only  major  observation  in  our  study  was  an  overall  increase  in  the  absorption  of  the 
collagen  peaks  after  co-culture  with  MCF-7  cells  over  time.  This  could  be  a  result  of 
fibroblasts  locally  depositing  collagen  upon  exposure  to  MCF-7  stimuli.  Myofibroblasts  play 
an  important  role  in  tissue  maintenance,  providing  a  wound  healing-type  response  by 
depositing  more  collagen  in  the  surrounding  extracellular  matrix  (41).  TGFpi  also  stimulates 
fibroblasts  to  deposit  collagen  via  the  Smad  pathway,  which  aids  in  the  transcription  of  the 
a2(I)  procollagen  gene,  COL1 A2  (42).  It  is  suggested  that  the  fibroblasts  present  in  collagen- 
dense  keloid  scars  are  more  susceptible  to  TGFpi  (43).  Further,  the  extracellular  matrix  can 
act  as  a  control  mechanism  for  the  involvement  of  TGFpi  in  collagen  biosynthesis  (44).  The 
other  possibility  is  that  upon  fibroblast  activation,  the  stiffening  of  the  cells  themselves 
results  in  the  contraction  of  the  surrounding  gel,  making  local  regions  appear  more  collagen- 
dense  in  the  absorption  spectra.  However,  no  detectable  gel  contraction  was  observed  upon 
visual  inspection  during  the  timecourse  of  this  experiment,  likely  due  to  the  low  cell  density 
of  fibroblasts  embedded  within  the  collagen  matrix.  For  these  reasons,  we  believe  that 


spectral  changes  seen  in  this  model  are  indicative  of  collagen  remodeling  by  cancer-activated 
fibroblasts. 

Consistent  with  the  2D  culture  results,  changes  were  seen  in  the  1080  cm'1  peak  in  the  3D 

culture  model.  There  was  an  increase  in  this  peak  initially,  however  after  24  hours  this  peak 

has  diminished.  The  ‘ebb  and  flow’  of  this  nucleic  acid  signature,  even  in  the  environment  of 
persistent  epithelial  cues,  suggests  that  fibroblasts’  molecular  expressions  settle  into  a  new 
equilibrium  upon  an  initial  exposure  to  transforming  stimuli.  This  observation  is  also 
consistent  with  changes  seen  in  the  Transwell  co-culture  (Figure  3,  bottom).  There  is  no 
change  in  RNA  levels  (1224  cm'1)  seen  in  this  model  compared  with  the  Transwell-culture 
model.  This  could  be  a  result  of  diminished  cytoplasmic  material  to  record  data  from  as  cells 
appear  smaller  in  the  3D  matrix  and  thus  the  cytoplasm  is  much  smaller  compared  to  the 
nucleus  of  the  cells.  Peaks  in  the  C-H  stretch  region,  as  with  other  cultures,  may  correlate 
with  changes  in  the  phospholipid  membrane  or  protein  synthesis  after  fibroblast  activation. 
Either  explanation  is  plausible  considering  the  physiologic  changes  that  occur  during  the 
fibroblast  to  myofibroblast  phenotypic  change.  However,  in  the  12-  and  24-  hour  time  points, 
there  is  a  significant  decrease  in  absorption  in  this  region  compared  with  the  control.  In 
general,  absorbance  in  this  area  was  low  compared  with  samples  cultured  in  monolayers  due 
to  cells  in  3D  cultures  being  sparsely  populated  and  of  thinner  shape  than  2D  cultures.  Thus, 
their  accurate  monitoring  is  much  more  challenging  than  2D  monolayers.  In  the  C-H 
stretching  region,  biochemical  changes  are  dominated  by  events  occurring  in  the  cytoplasm 
of  cells  (36)  as  were  the  changes  at  1224  cm'1.  In  the  3D  culture,  as  with  tissues,  it  is 
productive  to  examine  changes  due  to  cellular  secretion  of  growth  factors  in  the  surrounding 
extracellular  matrix.  Examining  changes  within  the  cells  themselves  requires  a  subcellular 
localization  of  signals  that  is  not  achieved  here. 

To  translate  the  understanding  from  these  studies,  clinical  breast  tissue  samples  were 
examined  by  IR  imaging  and  immunohistochemical  staining,  including  for  vimentin  and  a- 
SMA.  Vimentin  (Figure  5B)  will  stain  for  fibroblasts  and  other  mesoderm-derived  tissues.  In 
contrast,  a-SMA  (Figure  5C)  is  a  protein  found  in  myofibroblasts,  myoepithelium  that  lines 
each  gland,  and  smooth  muscle  cells  which  surround  blood  vessels.  Hence,  we  were  able  to 
differentiate  between  normal  and  activated  fibroblasts  by  comparing  the  localization  of  these 
two  markers  in  adjacent  sections  of  tissue.  In  the  clinical  breast  tissue  samples,  vimentin  (in 
brown)  is  primarily  seen  in  the  stroma  between  glands  (Figure  5B).  However,  only 
fibroblasts  nearest  the  cancerous  epithelium  express  a-SMA  (Figure  5C).  This  is  a  cancer- 
associated  signature  and  is  diagnostically  relevant.  In  order  to  provide  a  critical  comparator 
to  the  work  performed  in  our  monolayer  and  three-dimensional  co-culture  models,  we 
examined  spectral  differences  between  activated  and  resting-state  fibroblasts  in  these  clinical 
samples.  IR  spectroscopic  imaging  was  performed  on  an  entire  TMA.  Based  on  the  staining 

of  adjacent  sections,  pixels  were  manually  marked  and  classified  as  ‘fibroblast’  or 

‘myofibroblast’.  The  pixels  for  each  class  were  averaged  and  these  average  spectra  across  the 

TMA  were  examined,  as  shown  in  Figure  6.  Spectra  were  compared  with  the  3D  results, 
because  this  model  should  be  biologically  closest  to  clinical  samples.  However,  upon 
examination,  the  results  between  the  two  models  are  inconsistent.  Although  the  collagen 
peaks  (1300-1050  cm'1)  and  C-H  stretching  region  (3000-2800  cm'1),  are  consistent  in  shape 
between  the  three-dimensional  culture  model  and  the  tissue  sample,  myofibroblasts  from  the 


clinical  samples  show  lower  absorbance  in  the  biomolecular  fingerprint  region  (Figure  6A). 
Spectra  from  the  3D  cultures  were  ‘pure’,  consisting  only  of  normal  or  activated  fibroblasts 
and  type  I  collagen.  Using  clinical  samples  is  invaluable,  but  leads  to  more  variables  that 
become  increasingly  difficult  to  control.  For  example,  there  is  a  large  degree  of  variance 
between  patients,  even  for  the  noncancerous  biopsies  (unpublished  data,  M.J.  Walsh). 
Another  interesting  avenue  is  whether  immuhistochemical  stains,  used  here  as  the  gold 
standard  for  comparison,  are  truly  as  reliable  in  clinical  samples  as  in  cell  culture  studies. 

Across  the  three  systems  described  in  this  manuscript,  activated  fibroblasts  display  spectral 
changes  in  the  mid-IR  regions  associated  with  nucleic  acids  (1080  cm'1,  1224  cm'1)  and  C-H 
stretching  modes  (2850  cm'1,  2930  cm'1,  2960  cm'1).  Although  2D  and  3D  co-culture  models 
were  mostly  consistent,  we  found  some  discrepancies  between  the  in  vitro  and  the  clinical 
specimens.  Studying  this  transition  with  FT-IR  spectroscopy  under  controlled  cell  culture 
conditions  yields  important  information  about  the  potential  kinetics  of  paracrine  signaling 
between  epithelial  cells  and  fibroblasts.  Investigating  the  C-H  stretching  region  of  fibroblasts 
also  results  in  overall  increased  absorbance  at  all  three  peaks  across  all  scenarios,  including 
human  breast  tissue  biopsies  (Figure  6B).  The  nature  of  fibroblast  activation  involves  a 
cellular  phenotypic  change,  where  cytoplasmic  proteins  are  produced,  and  the  shape  of  the 
cell  undergoes  a  transformation.  These  biological  phenomena  can  be  correlated  with  the 
increased  absorbance  in  peaks  associated  with  C-H  stretching  as  a  marker  for  a  cancer- 
activated  stromal  profile.  Because  FT-IR  spectroscopic  imaging  can  be  used  to  study  the 
distribution  of  chemical  changes  across  the  area  of  a  sample,  this  understanding  can  be 
applied  to  detect  early  stromal  activation  in  noncancerous  areas  of  a  biopsy  or  tissue 
resection  independent  of  the  expression  of  a  biomarker.  This  same  technique  could  be 
expanded  to  different  biological  problems,  such  as  testing  the  effects  of  drug  delivery  on 
distal  tissues.  By  correlating  these  biological  phenomena  observed  in  cell  culture  with 
chemical  signatures,  label-free  imaging  in  complex  human  tissues  becomes  elucidated. 

FT-IR  spectroscopy  and  imaging  have  been  employed  to  measure  a  wide  variety  of 
biomolecular  species,  including  nucleic  acids,  collagen,  glycogen,  proteins,  and  fatty  acids. 
The  complex  mixtures  of  these  molecules  present  in  cells  and  tissues  implies  that  IR 
spectroscopy  is  useful  for  determining  global  biochemical  changes  in  classes  of  these 
materials  and  is  sensitive  to  the  metabolic  (45)  and  local  physiologic  state  of  the  tissue.  In 
this  study,  we  demonstrate  that  the  method  extracts  more  detailed  changes  compared  to 
conventional  immunofluorescence.  The  correlations  of  these  changes  with  mechanistic 
molecular  transitions  in  the  cell  can  now  be  established.  This  next  step  will  link  many  events 
in  the  transformation  a  simple,  label-free  measurement.  Since  IR  imaging  data  are  a 
convolution  of  the  underlying  spectral  and  structural  properties  of  the  tissue  (46)  and  the 
imaging  setup  (47-49)  and  optical  properties  (50-52),  measurement  of  specific  molecular 
alterations  becomes  very  challenging.  Nevertheless,  we  show  that  there  is  conservation  of 
some  changes  in  the  fibroblast-to-myofibroblast  transformation  that  translates  across 
monolayer  culture,  three-dimensional  culture,  and  human  tissues.  In  summary,  IR  absorption 
imaging  provides  a  label-free  approach  for  integrated,  first-pass  approaches  that  can  yield 
information  about  changes  in  the  sample.  Such  information  can  provide  a  basis  for  studies  by 
itself  or  an  early  indication  of  which  biological  assays  to  perform  next  and  is  especially 


critical  for  heterogeneous  samples  in  which  we  need  to  determine  where  to  perfonn  further 
molecular  analysis. 


CONCLUSIONS 

Normal  adult  human  fibroblasts  were  examined  in  monolayer  and  three-dimensional  cell 
cultures  as  well  as  formalin-fixed  and  paraffin  embedded  human  tissue  to  correlate  the 
expression  of  a-SMA  using  immunofluorescence  techniques  to  chemical  changes,  as 
observed  using  FT-IR  spectroscopic  imaging.  Spectral  changes  were  observed  predominantly 
in  the  C-H  stretching  region  (3290  cm'1)  and  phosphate  bonds  associated  with  nucleic  acids 
(1224  cm'1  and  1080  cm'1).  In  3D  co-cultures  and  human  tissue  biopsies,  the 
microenvironmental  changes  were  assessed  by  examining  vibrational  modes  commonly 
associated  with  collagen  (1283  cm'1,  1236  cm'1,  and  1204  cm'1).  Fibroblasts  activated  in  vitro 
via  TGFpi  stimulation  or  co-culture  with  breast  cancer  epithelial  cells  expressed  a-SMA  and 
were  spectrally  distinct  from  resting-state  fibroblast  controls.  This  was  also  true  in  the  tissue 
biopsies.  However,  the  spectra  from  cellular  cultures  were  not  entirely  consistent  with  those 
from  tissue,  particularly  in  the  phosphate  peaks.  Although  the  overall  spectral  characteristics 
are  conserved  between  the  3D  culture  and  biopsies,  specific  absorbance  values  were 
inconsistent.  Furthermore,  there  is  a  spatial  dependence  of  this  expression  based  on  the 
distance  of  the  fibroblasts  from  the  tumor  ‘source’,  determined  by  analysis  of  the  collagen 
peaks  and  expression  of  a-SMA  in  tissue.  By  directly  extracting  spectral  signatures  of 
fibroblast  activation,  analysis  can  potentially  provide  new  information,  be  conducted  in  a 
high-throughput  manner  and  reduce  variability,  time,  and  costs.  Finally,  this  work  exhibits  a 
novel  use  of  IR  spectroscopic  imaging  in  examining  stromal  changes  associated  with  tumor 
progression. 
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Figure  Legends 


Figure  1.  (A)  Schematic  of  the  trans-well  co-culture  system  that  allows  the  cells  to 
communicate  via  soluble  growth  factors  without  contact.  A  filter  with  a  0.2  (am  pore  size  is 
used  at  the  bottom  of  the  top  basket.  (B)  Schematic  of  the  three-dimensional  (3D)  co-culture 
setup,  which  is  comprised  of  cells  embedded  in  a  type  I  collagen  gel.  No  membrane  separates 
the  layers. 

Figure  2.  After  six  hours  (6h)  of  stimulation  with  1.5  ng/mL  TGFpi  (A)  or  6h  co-culture 
with  MCF-7  cells  (B),  a-SMA  is  expressed  in  dermal  fibroblasts.  Confocal  microscopy  (C) 
was  used  to  visualize  a-SMA  expression  in  fibroblasts  for  3D  systems.  Scale  bar  represents 
50  pm. 

Figure  3.  Top  In  fibroblasts  co-cultured  with  MCF-7  (A),  or  after  1.5  ng/mL  TGF[31 
stimulation  (B),  normal  dermal  fibroblasts  exhibit  changes  primarily  in  the  asymmetric  and 
symmetric  phosphate  stretching  bands,  indicating  bulk  changes  in  the  quantity  of  nucleic 
acids  over  time,  normalized  to  1656  cm'1  (Amide  I).  Fibroblasts  activated  through  co-culture 
show  sustained  levels  of  nucleic  acids  over  time,  whereas  levels  wane  in  TGFpi  activated 
fibroblasts. 

Bottom  Comparison  of  the  C-H-stretching  region  for  fibroblasts  co-cultured  with  MCF-7 
cells  (A)  and  TGFpi  stimulated  fibroblasts  (B).  Peaks  in  the  C-H  stretching  region  of  the 
spectrum  (2960  cm'1,  2932  cm'1,  and  2850  cm'1)  have  a  much  higher  absorbance  in  the  12- 
and  24-  hour  timepoints  compared  with  control.  This  suggests  an  increase  in  cell  metabolism 
through  the  presence  of  higher  amounts  of  fatty  acids.  After  6  hours  of  TGFpi  stimulation, 
fibroblasts  show  lower  absorbance  in  this  region  compared  with  control  and  MCF-7  co¬ 
culture. 

Figure  4.  Characteristic  absorbance  peaks  associated  with  collagen  (1283  cm'1,  1236  cm'1, 
and  1204  cm'1)  are  visible  and  elevated  in  fibroblasts  after  co-culture  with  MCF-7  (A).  At 
1080  cm'1  in  both  three-dimensional  and  two-dimensional  culture  (Figure  3B)  the  same 
cyclical  phenomenon  is  shown.  The  C-H  stretching  region  of  the  spectrum  (B)  is  distinct 
from  that  of  the  transwell  co-culture  (Figure  4B)  spectra. 

Figure  5.  Cancerous  breast  tissue  biopsies  demonstrate  glandular  and  stromal  regions  to 
examine  a-SMA  expression  proximal  to  cancerous  epithelium.  The  morphological  features 
are  distinguished  using  hematoxylin  and  eosin  staining  (A).  Fibroblasts  are  discerned  by 
using  IHC  staining  for  vimentin  (B).  IHC  staining  for  a-SMA  (C)  is  positive  for  activated 
fibroblasts  (myofibroblasts),  myoepithelium  (found  lining  the  gland),  and  smooth  muscle 
(found  around  blood  vessels).  a-SMA  positive  fibroblasts  are  located  adjacent  to  the 
cancerous  epithelium,  but  distal  fibroblasts  are  negative  for  this  protein.  Each  tissue  core, 
part  of  a  tissue  microarray  (TMA),  is  0.5  mm  in  diameter. 

Figure  6.  Pixels  on  a  TMA  were  classified  into  fibroblast  or  myofibroblasts  based  on  their 
average  spectra  as  shown  here.  Overall  normalized  absorption  was  higher  for  the  fibroblast 
class  compared  with  myofibroblasts  (A).  However,  in  the  C-H  stretching  region  (B), 
myofibroblasts  show  stronger  absorption  compared  with  fibroblasts  in  the  three  peaks  noted. 
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